Show simple item record

dc.contributor.authorDowney, Sean S.
dc.contributor.authorSun, Guowei
dc.contributor.authorNorquest, Peter
dc.date.accessioned2017-08-10T16:28:15Z
dc.date.available2017-08-10T16:28:15Z
dc.date.issued2017-08-10
dc.identifier.citationalineR: an R Package for Optimizing Feature-Weighted Alignments and Linguistic Distances. Sean S. Downey, Guowei Sun and Peter Norquest , The R Journal (2017) 9:1, pages 138-152.en
dc.identifier.issn2073-4859
dc.identifier.urihttp://hdl.handle.net/10150/625224
dc.description.abstractLinguistic distance measurements are commonly used in anthropology and biology when quantitative and statistical comparisons between words are needed. This is common, for example, when analyzing linguistic and genetic data. Such comparisons can provide insight into historical population patterns and evolutionary processes. However, the most commonly used linguistic distances are derived from edit distances, which do not weight phonetic features that may, for example, represent smaller-scale patterns in linguistic evolution. Thus, computational methods for calculating feature-weighted linguistic distances are needed for linguistic, biological, and evolutionary applications; additionally, the linguistic distances presented here are generic and may have broader applications in fields such as text mining and search, as well as applications in psycholinguistics and morphology. To facilitate this research, we are making available an open-source R software package that performs feature-weighted linguistic distance calculations. The package also includes a supervised learning methodology that uses a genetic algorithm and manually determined alignments to estimate 13 linguistic parameters including feature weights and a skip penalty. Here we present the package and use it to demonstrate the supervised learning methodology by estimating the optimal linguistic parameters for both simulated data and for a sample of Austronesian languages. Our results show that the methodology can estimate these parameters for both simulated and real language data, that optimizing feature weights improves alignment accuracy by approximately 29%, and that optimization significantly affects the resulting distance measurements. Availability: alineR is available on CRAN.
dc.description.sponsorshipNational Science Foundation [SBS-1030031]; University of Marylanden
dc.language.isoenen
dc.publisherR FOUNDATION STATISTICAL COMPUTINGen
dc.relation.urlhttps://journal.r-project.org/archive/2017/RJ-2017-005/index.htmlen
dc.rightsThis article is licensed under a Creative Commons Attribution 4.0 International license.en
dc.titlealineR: an R Package for Optimizing Feature-Weighted Alignments and Linguistic Distancesen
dc.typeArticleen
dc.contributor.departmentUniv Arizona, Dept Anthropolen
dc.identifier.journalR JOURNALen
dc.description.noteOpen Access Journalen
dc.description.collectioninformationThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.en
dc.eprint.versionFinal published versionen
refterms.dateFOA2018-09-11T22:14:06Z
html.description.abstractLinguistic distance measurements are commonly used in anthropology and biology when quantitative and statistical comparisons between words are needed. This is common, for example, when analyzing linguistic and genetic data. Such comparisons can provide insight into historical population patterns and evolutionary processes. However, the most commonly used linguistic distances are derived from edit distances, which do not weight phonetic features that may, for example, represent smaller-scale patterns in linguistic evolution. Thus, computational methods for calculating feature-weighted linguistic distances are needed for linguistic, biological, and evolutionary applications; additionally, the linguistic distances presented here are generic and may have broader applications in fields such as text mining and search, as well as applications in psycholinguistics and morphology. To facilitate this research, we are making available an open-source R software package that performs feature-weighted linguistic distance calculations. The package also includes a supervised learning methodology that uses a genetic algorithm and manually determined alignments to estimate 13 linguistic parameters including feature weights and a skip penalty. Here we present the package and use it to demonstrate the supervised learning methodology by estimating the optimal linguistic parameters for both simulated data and for a sample of Austronesian languages. Our results show that the methodology can estimate these parameters for both simulated and real language data, that optimizing feature weights improves alignment accuracy by approximately 29%, and that optimization significantly affects the resulting distance measurements. Availability: alineR is available on CRAN.


Files in this item

Thumbnail
Name:
RJ-2017-005.pdf
Size:
373.8Kb
Format:
PDF
Description:
FInal Published Version

This item appears in the following Collection(s)

Show simple item record