Statistical and Computational Methods for Analyzing Time-Course Metagenomic Sequencing Data
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
EmbargoRelease after 08/21/2020
AbstractNext generation DNA sequencing technique is widely applied to study the microbial community composition. Studies that record the temporal variation of microbial communities provide us valuable insights to understand the microbial communities and their relationship. Current available statistical methods lack power in detecting differentially abundant features contrasting different biological or medical conditions, in particular, for time course metagenomic sequencing data. Two novel procedures are proposed, metaDprof and CorrZIDF, to address such limitation. metaDprof is built with a spline-based method assuming heterogeneous errors, to meet with the challenges of detecting differentially abundant features from metagenomic samples by comparing different biological/medical conditions across time. This new approach can detect the features globally and also detect the time intervals for the changes. This method requires no prior knowledge of when differences occur and makes no assumption on pattern type of differential abundant features. Through comprehensive simulation and a real metagenomic data study, metaDprof shows the best performance compared with other existing methods, in terms of significant feature selection and accurate time-interval detection. CorrZIDF is developed specifically for the issue of excessive zeroes in the metagenomic counts data; meanwhile the correlation along time is included in the model. The large number of zeroes may be due to the physical absence or under-sampling of the microbes. CorrZIDF is based on the zero-inflated and distribution free functional response model to a longitudinal setting that incorporates the working correlation structure estimation to increases the relative efficiency of the estimation. The correlation structure is defined through a modified bivariate Pearson correlation estimation based on the FRM mixture. As a distribution free approach, CorrZIDF can handle the correlated structure without specify the marginal distribution. Based on simulation results and a real data analysis, the methods outperform others across different setting, from moderate correlated to highly correlated data, under different margin distributions and different working correlation structures.
Degree ProgramGraduate College