Out-of-level testing for special education students participating in large-scale achievement testing: A validity study
AuthorBrown, Laureen Kay
AdvisorSabers, Darrell L.
MetadataShow full item record
PublisherThe University of Arizona.
RightsCopyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
AbstractThe purpose of this study was to examine the reliability and validity of out-of-level (OOL) testing for students with mild cognitive disabilities participating in large-scale accountability assessments. Federal law now requires maximum participation of students with disabilities in these assessments, and OOL testing is one method used to accomplish this mandate. However, the prevalence, reliability, and validity of this practice have not been established. This study involved the analysis of second through eighth grade students' OOL and grade-level (GL) Stanford 9 reading and math subtest data. Raw data was collected by the district studied, as part of an annual state-mandated testing program. Participation rates and methods of participation for students with Specific Learning Disability (SLD) and Mild Mental Retardation (MIMR) were examined over a five-year period. Results indicated that an over 700% increase in the numbers of MIMR and SLD students participating in Stanford 9 testing occurred from 1998 to 2002. The use of OOL tests also increased substantially during that period. With regard to reliability, results indicated that KR-20 coefficients were comparable across regular education GL and Special Education OOL test groups. In addition, comparable percentages of students in GL and OOL groups scored within the test's reliable range. Special Education students were not given tests that were too easy as a result of OOL testing options. Validity evaluation included comparisons of modified caution indices (MCI) and point-biserial correlations for matched GL and OOL groups, as well as differential item functioning (DIF) analyses. MCI and point-biserial analyses provided no evidence of differential validity for GL and OOL groups. Although DIF analyses identified more items as functioning differently across groups (GL vs. OOL) than would be expected by chance, no systematic patterns of bias resulting from the OOL test administration condition were identified. OOL testing was determined to be an appropriate method of achievement testing for students with SLD. True differences between OOL and GL groups, as well as differences in test administration other than the OOL versus GL condition are discussed. Recommendations regarding OOL testing policy, stakeholder education, test development and reporting practices, and future research are included.
Degree ProgramGraduate College