Statistical Approaches for Handling Missing Data and Defining Estimands using Responder Analysis in Clinical Trials
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
EmbargoRelease after 05/10/2021
AbstractOutcomes from clinical trials often are evaluated by a difference in group means. However, statistical significance of the comparison can be achieved even if the magnitude of the difference is small. For example, consider a weight loss trial where patients in the treatment group lose, on average, four pounds more than those in the control group. That difference, even if statistically significant, may not represent an amount that sufficient to impact the patient’s quality of life or lower risk of disease. Clinical trial outcomes also can be evaluated by comparing proportions of patients achieving successful response. In so-called responder analysis, subjects are classified as responders if they improve by a specified threshold, often by dichotomizing a continuous outcome. This threshold usually represents the minimal amount of change that is either meaningful to the patient or clinically relevant. Missing data is an ever-present problem in longitudinal trials because it can bias trial results. In responder analysis, subjects with missing outcomes often are imputed as non-responders. This approach, recommended by regulatory guidance, may be perceived as conservative, although this is not always the case. This dissertation is motivated by the lack of statistically principled approaches to handle missing data in responder analysis in randomized controlled trials. The goal of this dissertation was to provide recommendations for the handling of missing data within the context of clinical trials using responder analysis that is statistically principled, straightforward for a wide range of analyst, and could be implemented in standard statistical software. Additionally, we challenged the currently recommended method to impute missing observations as non-responder. Aim 1 was to evaluate imputation methods for responder analysis. We simulated data representing a two-arm randomized controlled trial, generated dropout using two missing at random mechanisms, and varied response profiles and the percentage of missing data. We imputed missing observations three ways: 1) replacing missing responder status as non-responder; 2) using the best linear unbiased predictors (BLUPs) from a longitudinal mixed model prior to determining responder status; and 3) multiply imputing responder status and assessed bias, power, and type 1 error. Based on these simulations, we showed that imputing missing as non-response underestimated the between-group differences of responders in all scenarios leading to substantial negative bias and reduction of power. Estimates of the difference in responder proportions using BLUP imputation were relatively unbiased for most scenarios, however power and type 1 error were slightly inflated in some scenarios. Using multiple imputation, estimates were slightly positively biased for all scenarios, increasing as missing increased for a maximum 7.0% bias. Aim 2 sought to evaluate the specifications of the imputation model when a continuous outcome is dichotomized. In Aim 1, we showed that multiple imputation is a good choice for responder analysis with missing data but found no clear recommendation on how to best specify the imputation model. For example, practitioners can either impute the missing outcome before dichotomizing or dichotomize then impute. For Aim 2 we compared multiple imputation of the continuous and dichotomous forms of the outcome, and imputing responder status as non-response in responder analysis. We simulated data from a two-arm randomized trial, omitted responses using six missing at random mechanisms, and imputed missing outcomes three ways: 1) replacing as non-responder; 2) multiply imputing before dichotomizing; and 3) multiply imputing the dichotomized response. We found that both forms of multiple imputation performed better than non-response imputation in terms of bias and type 1 error. When approximately 30% of responses were missing, bias was less than 7.3% for all multiple imputation scenarios but imputing before dichotomizing generally had slightly lower bias compared to dichotomizing before imputing. Non-response imputation resulted in biased estimates, both underestimates and overestimates. Aim 3 was to compare multiple imputation (MI) with missing at random (MAR) and missing not at random (MNAR) assumptions relative to defined estimands using responder analysis. Advances in methodology of principled approaches to analyze incomplete data have highlighted the need to better define trial estimands. Estimands encompass four components: the research objective, the target population, the analytical approach, and the handling of post-randomization events including missing data and dropout. For Aim 3, we defined estimands for responder analysis and demonstrated the use of MAR and MNAR imputation methods within an estimand framework. To do this, we simulated data including adherence indicators and dropout from a two-arm randomized controlled trial measuring a continuous longitudinal outcome which was dichotomized for responder analysis. In one set of simulations we considered the true de jure difference in proportions to be from data where all subjects adhered to the protocol. In other simulations, we considered the true de facto difference in proportions to be from a combination of subjects who adhered and who did not adhere but were fully observed. We evaluated bias relative to the true value and linked the imputation method to the estimand and found the estimates from standard multiple imputation and MNAR multiple imputation were similar within all scenarios with dropout (regardless of the true value) but differed from each other. Responder analysis using standard multiple imputation estimated the difference in responders due to treatment taken as directed making it the best imputation choice for the de jure estimand. Likewise, responder analysis using MNAR multiple imputation best characterized the difference in responder proportions due to the treatment regimens as taken (assuming those who drop out are nonadherent), making it a good choice for the de facto estimand. The outline for this dissertation is as follows. Chapter 1 presents an overview of missing data in randomized trials as applied to responder analysis, including approaches to address missing data. We review estimands and considerations of estimands using responder analysis in clinical trials. Chapter 2 is the first paper which evaluates approaches to missing data in responder analysis. Chapter 3 extends the multiple imputation approach in responder analysis with missing data and considers variations of imputation models and imputing missing as non-response. Chapter 4 presents simulation results when an estimand framework is applied to responder analysis and includes definitions for sensitivity analysis through a missing not at random analytic approach. The final chapter discusses conclusions of this body of work.
Degree ProgramGraduate College