Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
Hybridization is an important mechanism in evolution. It can be detected by examining distributions of synonymous substitutions (Ks) in a genome. Traditional methods for examining these Ks plots include visual inspection and univariate mixture models. These traditional methods can be difficult to use. Instead I attempt to create a machine learning algorithm to examine Ks plots for evidence of hybridization and whole genome duplication (WGD). I trained and tested four different machine learning classifiers: Support Vector Classification (SVC), Linear Support Vector Classification (Linear SVC), Stochastic Gradient Descent (SGD), and Gaussian Naïve Bayes (Naïve Bayes). I found SVC to be the most accurate classifier, and that this accuracy increased with more samples and larger bin sizes. Refining this work will provide a framework with which to make further inferences about hybridizations.Type
textElectronic Thesis
Degree Name
B.S.Degree Level
bachelorsDegree Program
Honors CollegeEcology & Evolutionary Biology
