Computational Tools and Bioinformatics Pipelines for Analyzing Time-Course RNA-Sequencing Data in Notch-Mediated Liver Fibrosis Reversibility
Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
The development of RNA sequencing technologies holds great promise for understanding the transcriptomic landscape. Building upon these innovations, time-course RNA-seq has revolutionized the exploration of temporal transcriptomic profiling, offering deep insights into the dynamic nature of gene regulatory mechanisms. This specialized technology, which entails systematic and longitudinal measurements of gene expression, allows the investigation of both coherent and incoherent transitions throughout the duration of biological stimuli. A pivotal component of bioinformatics tools for time-course RNA-seq data analysis is clustering based on temporal dependency, which is critical for downstream analyses such as co-expression gene network construction and functional enrichment analysis. While a considerable amount of time-course data has been generated, there remains a scarcity of approaches or toolkits capable of precisely assessing transcriptional characteristics in response to time-dependent external stimuli. Thus, there is an urgent need for an improved bioinformatics strategy tailored to the unique demands of time-course RNA-seq clustering analysis. A major barrier to developing effective computational tools is the lack of accurate methods for characterizing pairwise gene similarity. Despite significant efforts to design such metrics, challenges persist due to the unique data structures (e.g., autocorrelation among time points, short time-series) and customized experimental designs inherent to time-course studies. To resolve these challenges, we have developed a novel nonparametric temporal gene expression clustering tool specifically designed for short time-course RNA-seq data. In Chapter 1, we introduced the background and characteristics of time-course experiments and time-course genomic data, discussed their applications, and reviewed existing methods and pipelines. We also outlined the major challenges in this field and the key questions we are going to address in this dissertation. In Chapter 2, we developed ES-Graph, a method that combines an ensemble similarity-based pairwise gene similarity metric with graph-based clustering. The similarity component integrates ensemble learning with sketching techniques to capture temporal co-expression, while the graph-based algorithm clusters genes based on biologically meaningful expression patterns. We applied ES-Graph to simulated short time-course bulk RNA-seq datasets to evaluate its performance and accuracy. Through comprehensive benchmarking across varying simulation settings, we demonstrated that ES-Graph performs competitively with existing state-of-the-art methods, especially under conditions of minimal prior cluster information and low signal-to-noise ratios from biological replicates. In Chapter 3, we further demonstrated the biological utility of ES-Graph by applying it to a reversible Notch-induced fibrosis resolution model. RNA-seq data collected across fibrosis progression and regression revealed distinct clusters of co-expressed genes. By integrating extensive publicly available human and mouse single-cell RNA-seq datasets covering various disease states, along with a robust cell type decomposition computational method, we found that fibrosis resolution was accompanied by dynamic shifts in hepatocyte and cholangiocyte proportions. Downstream functional analysis uncovered gene clusters linked to specific signaling pathways and mechanical cues involved in cholangiocyte-hepatocyte trans-differentiation. Furthermore, we elucidated the interactions among novel pathways and “hub” genes that regulate the regressive state of liver fibrosis. Our preliminary results pave the way for dissecting the transcriptional architecture of fibrosis resolution and identifying potential molecular targets for regenerative therapy. Collectively, this work presents an improved bioinformatics pipeline for uncovering transcriptional programs underlying dynamic biological processes and offers new insight into the cellular basis of liver repair.Type
textElectronic Dissertation
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeGenetics
