AuthorRachid Zaim, Samir
AdvisorLussier, Yves A.
Zhang, Hao H.
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractThis dissertation represents the unification of the body of research produced throughoutmy doctoral training, highlighting three major articles. These projects revolved around how refining and advancing algorithmic methodologies and frameworks in statistics and machine learning (ML) can improve experimental designs and analyses in genomics and transcriptomics for paving the road to interpretable and robust machine learning for precision medicine. The challenges in the Omics field of ML lies with noisy signal-tonoise ratio and a curse of dimensionality. Throughout this dissertation, one constant theme is demonstrating how feature reductions and improved signal to noise ratio with the use of gene sets (ontologies). This dissertation can be succinctly described as ontology-anchored dimension reduction, combined with single subject (N-of-1) analytics and machine learning applied to transcriptomics. The culmination of these projects is a final pilot study that brings together these concepts to create robust and interpretable machine learning classifiers for precision medicine that can be enriched to identify pathways and their interactions. In precision medicine, the goal is to deliver: The right treatment, at the right time, for the right person. The aim of my doctoral research is to continue advancing precision medicine bydeveloping cutting-edge statistical and machine learning software and frameworks to improve the state-of-the-art technology available. Building upon the works of colleagues, advisors, and others, this dissertation represents comprehensive efforts from a variety of scientific domains such as informatics, computer science, biology, genetics, mathematics, and last but not least, statistics. Common themes include experimental designs and evaluations, ontologies and knowledge graphs, large-scale significance testing, correlation structures, ensemble learners, and random forests. The first chapter introduces the logistics of the scientific dissertation structure. In the second chapter, a numerical study illustrates the increased ability to detect individualized differential gene expression when we aggregate signal using gene ontologies to group genes by their biological processes. The third chapter borrows from machine learning and mathematics to optimize small-sample and single-subject studies in genomics, while a third study is presented in Chapter 4, introducing a novel, effective, and scalable feature selection machine learning algorithm to identify differential gene products and interactions by combining random forests and correlated Bernoulli trials for large-scale hypothesis testing. The final chapter presents a pilot study that combines all these projects into a proof-of-concept of how to create robust and interpretable machine learning classifiers in small-sample studies for precision medicine. These techniques were all developed and applied to analyze Next Generation Sequencing (NGS) and RNA-sequencing data derived from samples in cohort studies, and their biological mechanisms were incorporated from gene ontologies. As is implicit in these works, they represent an interdisciplinary effort that is only possible in team science, allowing for creative solutions when the best minds in statistics, computer science, mathematics, biology, and medicine come together to work on the same problem. Statistical & Machine Learning Advisor: Helen H. ZhangBio- and Clinical Informatics Advisor: Yves A. Lussier
Degree ProgramGraduate College