• Evaluating single-subject study methods for personal transcriptomic interpretations to advance precision medicine

      Rachid Zaim, Samir; Kenost, Colleen; Berghout, Joanne; Vitali, Francesca; Zhang, Helen Hao; Lussier, Yves A; Univ Arizona Hlth Sci, Ctr Biomed Informat & Biostat; Univ Arizona, Grad Interdisciplinary Program Stat; Univ Arizona, Dept Math, Coll Sci; Univ Arizona, Ctr Canc (BMC, 2019-07-11)
      Background Gene expression profiling has benefited medicine by providing clinically relevant insights at the molecular candidate and systems levels. However, to adopt a more precision' approach that integrates individual variability including omics data into risk assessments, diagnoses, and therapeutic decision making, whole transcriptome expression needs to be interpreted meaningfully for single subjects. We propose an all-against-one framework that uses biological replicates in isogenic conditions for testing differentially expressed genes (DEGs) in a single subject (ss) in the absence of an appropriate external reference standard or replicates. To evaluate our proposed all-against-one framework, we construct reference standards (RSs) with five conventional replicate-anchored analyses (NOISeq, DEGseq, edgeR, DESeq, DESeq2) and the remainder were treated separately as single-subject sample pairs for ss analyses (without replicates).ResultsEight ss methods (NOISeq, DEGseq, edgeR, mixture model, DESeq, DESeq2, iDEG, and ensemble) for identifying genes with differential expression were compared in Yeast (parental line versus snf2 deletion mutant; n=42/condition) and a MCF7 breast-cancer cell line (baseline versus stimulated with estradiol; n=7/condition). Receiver-operator characteristic (ROC) and precision-recall plots were determined for eight ss methods against each of the five RSs in both datasets. Consistent with prior analyses of these data, similar to 50% and similar to 15% DEGs were obtained in Yeast and MCF7 datasets respectively, regardless of the RSs method. NOISeq, edgeR, and DESeq were the most concordant for creating a RS. Single-subject versions of NOISeq, DEGseq, and an ensemble learner achieved the best median ROC-area-under-the-curve to compare two transcriptomes without replicates regardless of the RS method and dataset (>90% in Yeast, >0.75 in MCF7). Further, distinct specific single-subject methods perform better according to different proportions of DEGs.ConclusionsThe all-against-one framework provides a honest evaluation framework for single-subject DEG studies since these methods are evaluated, by design, against reference standards produced by unrelated DEG methods. The ss-ensemble method was the only one to reliably produce higher accuracies in all conditions tested in this conservative evaluation framework. However, single-subject methods for identifying DEGs from paired samples need improvement, as no method performed with precision>90% and obtained moderate levels of recall.
    • Single subject transcriptome analysis to identify functionally signed gene set or pathway activity

      Berghout, Joanne; Li, Qike; Pouladi, Nima; Li, Jianrong; Lussier, Yves A; Univ Arizona, Ctr Biomed Informat & Biostat CB2, Dept Med; Univ Arizona, Dept Med, Ctr Appl Genet & Genom Med; Univ Arizona, Grad Interdisciplinary Program Stat; Univ Arizona, CB2; Univ Arizona, Canc Ctr, BIO5 Inst, Dept Med (WORLD SCIENTIFIC PUBL CO PTE LTD, 2018)
      Analysis of single-subject transcriptome response data is an unmet need of precision medicine, made challenging by the high dimension, dynamic nature and difficulty in extracting meaningful signals from biological or stochastic noise. We have proposed a method for single subject analysis that uses a mixture model for transcript fold-change clustering from isogenically paired samples, followed by integration of these distributions with Gene Ontology Biological Processes (GO-BP) to reduce dimension and identify functional attributes. We then extended these methods to develop functional signing metrics for gene set process regulation by incorporating biological repressor relationships encoded in GO-BP as negatively regulates edges. Results revealed reproducible and biologically meaningful signals from analysis of a single subject's response, opening the door to future transcriptomic studies where subject and resource availability are currently limiting. We used inbred mouse strains fed different diets to provide isogenic biological replicates, permitting rigorous validation of our method. We compared significant genotype-specific GO-BP term results for overlap and rank order across three replicate pairs per genotype, and cross-methods to reference standards (limma+FET, SAM+FET, and GSEA). All single-subject analytics findings were robust and highly reproducible (median area under the ROC curve=0.96, n=24 genotypes x 3 replicates), providing confidence and validation of this approach for analyses in single subjects. R code is available online at http://www.lussiergroup.org/publications/PathwayActivity
    • Testing for differentially expressed genetic pathways with single-subject N-of-1 data in the presence of inter-gene correlation.

      Schissler, A Grant; Piegorsch, Walter W; Lussier, Yves A; Univ Arizona, Interdisciplinary Program Stat; Univ Arizona, Ctr Biomed Informat & Biostat CB2; Univ Arizona, Inst BIO5; Univ Arizona, Dept Med; Univ Arizona, Dept Math (SAGE PUBLICATIONS LTD, 2017-05-29)
      Modern precision medicine increasingly relies on molecular data analytics, wherein development of interpretable single-subject ("N-of-1") signals is a challenging goal. A previously developed global framework, N-of-1- pathways, employs single-subject gene expression data to identify differentially expressed gene set pathways in an individual patient. Unfortunately, the limited amount of data within the single-subject, N-of-1 setting makes construction of suitable statistical inferences for identifying differentially expressed gene set pathways difficult, especially when non-trivial inter-gene correlation is present. We propose a method that exploits external information on gene expression correlations to cluster positively co-expressed genes within pathways, then assesses differential expression across the clusters within a pathway. A simulation study illustrates that the cluster-based approach exhibits satisfactory false-positive error control and reasonable power to detect differentially expressed gene set pathways. An example with a single N-of-1 patient's triple negative breast cancer data illustrates use of the methodology.