Impact of Rates of Gene Duplication and Domain Shuffling on Species Tree Inference with Gene Tree Parsimony
AdvisorSanderson, Michael J.
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
EmbargoRelease after 02-Feb-2014
AbstractGenome sequencing technologies are providing huge quantities of data for phylogenetic inference. However, most phylogenomic studies exclude gene families, because many have a complicated history of gene duplication/loss and structural change by domain shuffling, especially in deep phylogenies. Gene tree parsimony (GTP) methods, which seek the species tree that minimizes the cost of gene duplication, have been successfully applied to gene families with frequent duplication history. Their utility and performance in the context of gene families with complex histories of gene duplication and domain reshuffling remains unclear. In this study, we analyzed 4389 gene families from six angiosperm genomes encompassing a wide range of duplication rates, and a broad diversity of domain architecture. Overall species tree inference accuracy increased monotonically with the inclusion of more gene trees, and high accuracy was achieved with 50-100 gene trees. The rate of gene duplication strongly influences species tree inference accuracy, with the highest accuracy at either very low or very high rates of duplication and lowest accuracy centered around one duplication per branch in the unrooted species tree. This is the opposite of the relationship between substitution rates on tree construction accuracy, in which intermediate rates have highest accuracy. Accuracy is generally higher in gene families with high domain architecture diversity but has high variance in families with relatively low domain architecture diversity. The latter is probably due to the high variation of gene duplication number for those gene families. We close with some discussion of potential impacts of domain evolution on phylogenomic reconstruction protocols in general, including its effect on alignment.
Degree ProgramGraduate College
Ecology & Evolutionary Biology