• Evaluation of the Measurement Properties of the Short Form 36 Version 2 Health Survey in a Sample of Patients with Multiple Sclerosis

      Malone, Daniel; Khalaf, Kristin Marie; Slack, Marion; Warholak, Terri; Coyne, Karin; Reeve, Bryce; Malone, Daniel (The University of Arizona., 2016)
      Background: In health status assessment, patient-reported outcome (PRO) measures are tools used to elicit important and measurable information from patients to better understand the impact of health conditions on their lives. Such impacts are considered latent constructs, or variables that cannot be observed or measured directly. Instruments intended to assess latent constructs must satisfy certain development, psychometric, and scaling standards through the generation of both qualitative and quantitative evidence to demonstrate the adequacy of its measurement properties. Health-related quality of life (HRQOL), or the subjective perception of health, is a core concept within the field of PROs. The Short Form 36 (SF-36) is one of the most commonly used PROs used to assess health-related quality of life (HRQOL).Objectives: To provide a better understanding of the performance and dimensionality of the SF-36 version 2 in a cross-sectional sample of patients with multiple sclerosis (MS) on an item, subscale, and higher-order factor structure level using different measurement methods grounded in classical test theory (CTT), factor analysis, and item response theory (IRT).Methods: This was a post hoc analysis of a cross-sectional dataset. Patients with MS were recruited to participate in an online survey asking a variety of questions related to their health and treatment seeking behaviors. The SF-36 was one of the questionnaires included in the survey. Items and individual subscales were evaluated using a multi-trait/multi-item correlation matrix to assess item-to-subscale relationships, including item discriminant validity with other subscales. Unidimensionality for select SF-36 subscales was assessed through confirmatory factor analysis (CFA). Internal consistency reliability (Cronbach's alpha) was evaluated for each subscale. Patient-reported disability, depression, and current symptom exacerbation status were evaluated relative to SF-36 subscale scores to assess convergent validity, discriminant validity, and known-groups validity. Higher-order factor models of the SF-36 were tested to evaluate dimensionality of the instrument, including a two-factor second-order factor model, a bifactor model, and a statistical comparison between the bifactor model and its corresponding nested model. Unidimensionality was further evaluated through the use of graded response IRT models. The relative fit of traditional versus discrimination-constrained models was tested using a -2 loglikelihood ratio test, followed by an evaluation of item-level properties for fit (S-X² statistics), local dependence, and further assessment of model parameters (discrimination parameters, location parameters, option response functions, and test information curves). Person location parameters were also estimated to compare scale information to the location of patients along the latent construct. Results: A total of 1,052 respondents completed the survey. Unidimensionality of individual subscales evaluated via CFA all had confirmatory fit indices (CFI)>0.90, butroot mean square error of approximation [RMSEA] values all exceeded 0.08. All IRT graded response models showed a statistically significant improvement in model fit when item discrimination was freely estimated. Each subscale from the IRT models had at least one mis-fitting item across all unidimensional scales tested (S-X² p-value>0.05), and nearly all subscales tested showed item pairs with signs of local dependence. Cronbach's alpha was>0.80 for all subscales except for General Health [GH] (alpha = 0.78). SF-36 subscales most closely related to physical aspects of health status had the strongest relationship to disability status (physical functioning [PF], r = -0.82, and role physical [RP], r = -0.57). Subscales more closely related to mental health had the largest effect sizes between patients with versus without depression (0.88 for mental health [MH] subscale) and the smallest effect sizes between patients reporting currently experiencing versus not experiencing an exacerbation of their symptoms (0.48 for role emotional [RE]subscale). Both CFA and IRT analyses showed lack of compelling evidence supporting unidimensionality upon combining items from the PF, RP, bodily pain [BP], and GH subscales to form the Physical-21, and upon combining items from the VT, role emotional (RE), social functioning (SF), and MH subscales to form the Mental-14. Higher-order factor models showed good model fit, with CFI>0.90 in all cases and lower RMSEA values than seen for the individual subscales (0.077 to 0.107). The bifactor model fit significantly better than its nested second-order version, however, the best-fitting (i.e., highest CFI and lowest RMSEA) higher-order factor model was the preliminary first-order model with eight first-order factors consistent with the eight subscales of the SF-36 (CFI=0.996, RMSEA=0.077, X² = 3872.14, p<0.001). Conclusions: The SF-36 version 2 performed well when evaluated within the CTT framework, but both CFA and IRT methods revealed several limitations at the item and factor level across all subscales, due to item wording (i.e., positive versus negative), items not being sufficiently related to its latent construct, and local dependence of items within and across subscales. The appropriateness of equal weighting of responses to produce a single summary score for each subscale, as well as their further aggregation into the Physical Component Summary and Mental Component Summary scores should be reevaluated.