Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Embargo
Release after 05/04/2024Abstract
The microbiome is the genetic material of all the microbes and is involved in many biological functions. Microbes are found on our skin, in our mouth, gut, and genitals. The human microbiome makes an essential contribution to the normal functioning of our bodies and our health. With the increased interest and the reduction of the cost of sequencing, there are more data available and call for statistical analysis methods. This dissertation focus on two preprocessing steps for microbial studies: normalization and imputation. Metagenomic time-series studies provide insights to investigate the dynamics of microbial systems. Normalization is the first critical step in microbial count data analysis used to account for variable library sizes. However, there is no method to normalize the microbial count data for a time-series study appropriately. Here we propose TimeNorm when both the within and across time point structures are considered. Under various settings through simulation studies TimeNorm is shown surpass the existing normalization methods developed for static data. The second project focus on solving the sparsity issues. Microbiome data analysis is challenging because of the existence of a large number of non-biological zeros, which hinders downstream analysis. We propose two imputation methods, PhyImpute and UniFracImpute, for microbial count data to identify and impute the non-biological zeros by borrowing information from similar samples to address this challenge. The proposed work directly involves the probability of non-biological and phylogenetic trees to account for sample-to-sample similarity estimation. The proposed imputation methods have demonstrated better performance than the existing methods through comprehensive simulation studies and real data analyses.Type
textElectronic Dissertation
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeBiosystems Engineering