MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractIn longitudinal studies of chronic diseases, the disease states of individuals are often collected at several pre-scheduled clinical visits, but the exact states and the times of transitioning from one state to another between observations are not observed. This is commonly referred to as "panel data". Statistical challenges arise in panel data in regard to identifying predictors governing the transitions between different disease states with only the partially observed disease history. Continuous-time Markov models (CTMMs) are commonly used to analyze panel data, and allow maximum likelihood estimations without making any assumptions about the unobserved states and transition times. By assuming that the underlying disease process is Markovian, CTMMs yield tractable likelihood. However, CTMMs generally allow covariate effect to differ for different transitions, resulting in a much higher number of coefficients to be estimated than the number of covariates, and model overfitting can easily happen in practice. In three papers, I develop a regularized CTMM using the elastic net penalty for panel data, and implement it in an R package. The proposed method is capable of simultaneous variable selection and estimation even when the dimension of the covariates is high. In the first paper (Section 2), I use elastic net penalty to regularize the CTMM, and derive an efficient coordinate descent algorithm to solve the corresponding optimization problem. The algorithm takes advantage of the multinomial state distribution under the non-informative observation scheme assumption to simplify computation of key quantities. Simulation study shows that this method can effectively select true non-zero predictors while reducing model size. In the second paper (Section 3), I extend the regularized CTMM developed in the previous paper to accommodate exact death times and censored states. Death is commonly included as an endpoint in longitudinal studies, and exact time of death can be easily obtained but the state path leading to death is usually unknown. I show that exact death times result in a very different form of likelihood, and the dependency of death time on the model requires significantly different numerical methods for computing the derivatives of the log likelihood, a key quantity for the coordinate descent algorithm. I propose to use numerical differentiation to compute the derivatives of the log likelihood. Computation of the derivatives of the log likelihood from a transition involving a censored state is also discussed. I carry out a simulation study to evaluate the performance of this extension, which shows consistently good variable selection properties and comparable prediction accuracy compared to the oracle models where only true non-zero coefficient are fitted. I then apply the regularized CTMM to the airflow limitation data to the TESAOD (The Tucson Epidemiological Study of Airway Obstructive Disease) study with exact death times and censored states, and obtain a prediction model with great size reduction from a total of 220 potential parameters. Methods developed in the first two papers are implemented in an R package markovnet, and a detailed introduction to the key functionalities of the package is demonstrated with a simulated data set in the third paper (Section 4). Finally, some conclusion remarks are given and directions to future work are discussed (Section 5). The outline for this dissertation is as follows. Section 1 presents an in-depth background regarding panel data, CTMMs, and penalized regression methods, as well as an brief description of the TESAOD study design. Section 2 describes the first paper entitled "Regularized continuous-time Markov model via elastic net'". Section 3 describes the second paper entitled "Regularized continuous-time Markov model with exact death times and censored states"'. Section 4 describes the third paper "Regularized continuous-time Markov model for panel data: the markovnet package for R"'. Section 5 gives an overall summary and a discussion of future work.
Degree ProgramGraduate College