Multi-Allele Population Genomics for Inference of Demography and Natural Selection
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractThe demographic and evolutionary history of a population leaves an identifiable signature on patterns of genetic variation, so we can learn about demography and natural selection through inference on contemporary polymorphism data. The distribution of sample allele frequencies, known as the allele frequency spectrum (AFS), is an informative statistic that has been used to infer single- and multi-population demographic histories and distributions of fitness effects of new mutations. AFS-based methods typically rely on the infinite sites model, in which loci are assumed to evolve independently and mutations always arise at a previously unmutated site. However, many loci are seen to violate these assumptions. Most obviously, loci occupy a physical space on the genome, and neighboring mutations will have correlated allele frequencies. Additionally some SNPs are found to be multi-allelic, with more than two alleles simultaneously segregating. The assumptions of the infinite sites model forces one to ignore or exclude such loci, but these loci are rich in information not captured by standard AFS approaches. With this in mind, I developed a numerical approach for solving a class of multi-allelic diffusion equations that allow for novel inferences on genomic sequence data. First, I considered selection at triallelic nonsynonymous data to infer the correlation of fitness effects for same-site mutations. I then explored the increase in power afforded to demographic inferences by two-locus allele frequency statistics, in which two biallelic loci are separated by a known recombination distance so the joint distribution of allele frequencies and linkage disequilibrium may be modeled by a diffusion approximation. Finally, I considered the same two-locus diffusion model but with selection placed on one of the two loci. This allows for the direct modeling of the effects of linked selection on neutral variants, and for potential inference applications such as the parameters of a selective sweep or the distribution of fitness effects.
Degree ProgramGraduate College