Simultaneous Change-point Detection and Curve Estimation for Single and Multiple Sequential Data
Publisher
The University of Arizona.Rights
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.Abstract
In this dissertation, I propose a new class of nonparametric regression methods that account for discontinuities in single and multiple sequential data. The new method is called Simultaneous CHange-point detection And Curve Estimation (SCHACE), which can automatically detect jumps in data sequences and, at the same time, accurately capture nonlinear trends between these jumps in the mean curve. The SCHACE is a unified regularization framework that nicely integrates two statistical tools: the normalized fused LASSO for change-point detection and B-splines for curve estimation. Notably, this approach is a single-step method that does not require iteration and is therefore computationally efficient and fast to implement. To evaluate the performance of SCHACE, I conduct extensive numerical experiments, including both simulated and real-world data examples, to demonstrate its advantages over other competing methods in the literature. Furthermore, I also study the problem of change-point detection for multiple data sequences simultaneously. Towards this, I propose two variants to generalize SCHACE by integrating group LASSO for change-point detection in multiple sequences and B-splines for nonparametric regression curve estimation. The first variant is a direct extension of SCHACE for multiple sequences, which selects the degrees of freedom for B-splines by a grid-based tuning procedure. This method assumes the same level of smoothness shared by the mean of multiple sequences. To relax this assumption, I propose the second variant, which allows for different levels of smoothness across multiple curves. It first assigns relatively high degrees of freedom to B-splines for each sequence and then imposes a penalty to coefficients of the B-spline basis matrix. In this way, the degrees of freedom of B-splines for each sequence can be selected independently by solving an optimization problem.Type
Electronic Dissertationtext
Degree Name
Ph.D.Degree Level
doctoralDegree Program
Graduate CollegeStatistics