Statistical Methods for Improving Low Frequency Variant Calling in Cancer Genomics
Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
Cancer is not a single disease, but a family of genomic diseases characterized by a set of initiating genomic variants accumulated in a single cell that allows that cell to begin dividing uncontrollably. Tumors grow by cell division, and each cell division generates a new set of variants that are passed along to its offspring. As a result, at the time of diagnosis a typical tumor of approximately 100,000,000 cells contains hundreds of millions of genomic variants, whose frequency in the population is a function of the time that they arose. Mutation accumulation through both inheritance and de novo variant production results in a final tumor in which the vast majority of variants are present at low frequency. Current methods used to identify variants have difficulty identifying low frequency variants. Here I will describe two algorithms aimed at improving low frequency variant calling in two settings. Patient-Derived Xenografts (PDXs) serve as avatars for individual patient disease as well as invaluable models for studying basic cancer biology. Molecular character- ization of PDXs is common, but the extensive homology between human and mouse genes present special challenges in sequencing tumors grown in mice. In Chapter 2 I describe an algorithm and R implementation called MAPEX that allows labs study- ing PDXs to use commercial sequencing technologies and locally filter false positive variants caused by sequence homology. Detecting somatic mutations within tumors is key to understanding treatment re- sistance, patient prognosis, and tumor evolution. In Chapter 3 I present BATCAVE (Bayesian Analysis Tools for Context-Aware Variant Evaluation), which extends cur- rent state-of-the-art statistical models for tumor variant calling. I also present an R implementation of the algorithm, and show using simulations that the BATCAVE algorithm improves variant detection.Type
textElectronic Dissertation
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeBiostatistics