Novel Deep Learning Methods for Single-Cell RNA-Seq and CITE-Seq Studies
Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
In recent years, an increasing number of studies have shown that the expression levels of mRNA molecules (collectively referred to as the "transcriptome") are highly correlated with cell types and states. Initially, RNA expression was quantified using microarrays and later through next-generation sequencing techniques (NGS) in a method known as bulk RNA-seq. While bulk RNA-seq has contributed significantly to biomedical research and clinical discoveries, averaging gene expressions across a large number of cells does not provide detailed information about individual cells. To address this limitation, single-cell RNA-seq (scRNA-seq) was developed and enables researchers to profile RNA molecule expressions in individual cells at a much higher resolution on a genomic scale. With this powerful tool, more innovative discoveries in biomedicine can be expected. Here, two analytic methods on single-cell research are developed. The first study introduces an effective imputation method called NISC, which uses an autoencoder with weighted loss function, and regularization to denoise scRNA-seq count data. A systematic evaluation shows that NISC is superior to existing imputation methods in handling sparse scRNA-seq count data and improving cell type identification. The second study focuses on CITE-seq data, which is a type of single-cell multi-omics data that combines scRNA-seq data with surface protein data. By integrating these data sets, researchers can analyze complex big data at multilevel transitions for single cells and uncover novel heterogeneous tissue architectures. However, a critical challenge in CITE-seq data analysis is that the dimension of RNA is typically thousands of times higher than that of protein, which can diminish the impact of protein on downstream clustering. To meet this challenge, an autoencoder-based dimension reduction method, AutoCITE is developed. It integrates the protein data and RNA data, thereby improving the accuracy of downstream cell type identification for CITE-seq data.Type
Electronic Dissertationtext
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeBiosystems Analytics & Technology.