The Development and Validation of a Reliable Alternate Form for Raven's Standard Progressive Matrices

The development of an alternate form for Raven's Standard Progressive Matrices (SPM) Test is described. Items for each of the original 60 items of the SPM were developed to be comparable to the corresponding original items in terms of the underlying strategy and difficulty. An alternate form reliability analysis on a diverse group of 449 children (77 African-American, 122 Asian, 54 Filipino, 156 Latino/Hispanic, 38 white, and 2 other) showed an alternate form reliability coefficient of .90. Kuder-Richardson reliabilities of the newly developed alternate and the SPM were identical at .94: A Kolmogorov-Smirnov two-sample test, moreover, revealed no significant differences in central tendency, dispersion, and skewness for distributions of individual item difficulties. In addition, the two tests showed comparable predictive validity coefficients. The alternate form resolves one limitation of the SPM and could provide widespread utility as a research tool.

The Raven Progressive Matrices Tests, including the Standard (SPM) and Adyanced (APM) versions, are among the most widely used and referenced of the standardized psychometric tests of ability (Carpenter, Just, & Shell, 1990;Kaplan & Saccuzzo, 1993;Raven, 1989). The SPM, for example, is extensively used in clinical, neuropsychological, educational, and research settings (Carpenter et al., 1990;Cherkes-Julkowski, Stolzenberg, & Segal, 1990;Larson, Alderton, & Kaupp, 1991). Of the psychometric ability tests, the SPM and related versions are among the best measures of Spearman's (1927) g factor (hiarshalek, Lohman, & Snow, 1983;Snow, Kyllonen, & hlarshalek, 1984). Indeed, perhaps more is known about what the SPM and related versions measure than about any other ability test. Positron Emission Tomography (PET) Scan studies have revealed that the entire brain-especially the right cerebral hemisphere, left temporal lobe, and left frontal lobe-is used in solving APM problems (Haier et a]., 1988). Theoretical reviews have pointed to the SPhl and related versions as measures of the "ability to reason and solve problems involving new information, without relying extensively on an explicit base of declarative knowledge" (Carpenter et al., 1990, p. 404).
Until recently, the application of the SPM was limited due to the absence of adequate norms and of an alternate form. Now that an extensive and relatively current set of U.S. and worldwide norms is available (Raven, 1986), the first of these limitations has finally been overcome. In addition, to facilitate the research use of the SPhl and permit matching across various forms, Jensen, Saccuzzo, and Larson (1988) developed a set of standards for equating the SPhl and APM. To date, however, the absence of an alternate form remains a limitation.
An alternate form is needed for a variety of reasons. One such reason would be the measurement of an intervention effect. In neuropsychological research, for example, it is often desirable to evaluate the effects of an intervention within a time frame that is shorter than the optimal duration for a test-retest paradigm. In such cases, an alternate form would be extremely useful to evaluate intervention effects while minimizing the effects of practice. In our own research, which involves group testing, an alternate is needed as a replacement for invalid test administrations. Thus, an alternate form has potential utility as a research tool in neuropsychology, education, intellectual assessment and other areas of psychological inquiry. In the present study, we report on the development, validity, and reliability of an alternate form of the SPM called the San Diego Test of Reasoning Ability (SANTRA) (Johnson et al., 1993). In developing this alternate form, we began with the 12 item rules as explicated by Jacobs and Vandeventer (1972), such as "shading" (i.e., progressive change in shading), "number series" (constant increase in items across cells), and "mirror image" (i.e., figure moves as if lifted up and replaced face down). For each of the 60 items in the SPM, we determined the item rule(s) underlying solution of the matrix. Two researchers discussed the rule(s) and attempted to develop an entirely neiv item that was based on the same item rule(s) and designed to be of comparable difficulty and complexity. The item was submitted to a third researcher who independently checked the item in terms of the underlying rule(s). If the third person disagreed, there was further discussion and a neiv item was submitted. This process continued until there was complete agreement. Our initial psychometric evaluation of the alternate form is based on a cross section of children from a large, diverse school district.

Participants
The participants were 449 second-, fifth-, and seventh-grade students. Permission was obtained from the school district to select entire classes that were believed to reflect the San Diego City School District of 123,000 children in terms of diversity of the children. In this sample, 77 were African-American, 122 Asian, 54 Filipino, 156 Latino/Hispanic, 38 white, and 2 other. Of these 214 were boys, 233 were girls, and for 2 gender data were missing. The mean age of the children was 11 years (range = 6 years, 8 months-13 years, 10 months).

Procedure
Parents of the children provided written informed consent for participation in the study. Each child was administered both the SPM and the alternate (SANTRA). Approximately half (216) of the children were randomly assigned to a SPhl-alternate form order of test administration and half to an alternate-SPhl order. Upon finishing the first test, each participant turned in the test protocol and was given the form not yet taken. Testing was untimed; children were told to work as quickly and accurately as possible but that there was no time limit. Total testing time ranged from approximately 1Yz to 2Y2 hours.

Results
The overall correlation (Pearson r) between the SPM and the newly developed alternate form (i,e., alternate form reliability coefficient) was .90 (p < .01). The Kuder Richardson (KR-20) reliabilities for the SPhl and alternate forms were .9447 and .9442, respectively. Thus, the two forms were highly comparable. To examine item difficulty for the two forms, we calculated the percentage of children who obtained correct responses on each of the 60 items comprising each of the two forms (see Table  1). Distributions of item difficulty were compared using a Kolmogorov-Smirnov two-sample test. As Siege1 (1956) noted, this nonparametric test "is sensitive to any kind of difference in the distributions from which the two samples [of items] were drawn-differences in location (central tendency), in dispersion, in skewness, etc." (p. 127). The results showed no statistically significant difference (p > 36) between the two distributions of individual item difficulties, thus supporting the comparability of the two forms.
Mean number correct was 36.10 (SD = 11.52) for the SPM and 35.46 (SD = 11.93) for the alternate. The slight difference in these means was statistically significant, t(448) = -2.56, p c .01. Table 2 contains the distribution of SPM minus SANTRA discrepancy scores. This frequency distribution revealed a negative skew (skewness = -.11) consistent with the slight but significant difference in the overall test means. Thus, although the SANTRA can be considered an alternate form, the SANTRA was slightly more difficult than the SPM.
For a subset of the sample (n = 32), achievement test scores were available. For these children, the two forms were correlated (Pearson r) with the Comprehensive Test of Basic Skills (CTBS; 1982) Total Language, Total Reading, and Total Math. Table 3 shows the intercorrelation matrix. Inspection of Table 3 reveals that both forms were approximately equivalent in the correlation with each of the three achievement measures, with a range of correlation coefficients of .42 to .56.   Moreover, given the involvement of the left temporal lobes in solving APM problems, it is likely that subjects use a verbal strategy when solving SPMtype stimuli. Hence, verbal skills are utilized in the solution of the types of problems represented by the SPM and the alternate form, the SANTRA. The SANTRA exhibited considerable potential as a highly reliable alternate form for the SPM. The SANTRA, proved to be slightly more difficult than t h e SPM. Thus, the SANTRA can best be described as an alternate, but not an item-by-item parallel form. Clearly, the SANTRA is for research purposes and is not meant for clinical purposes or as a replacement for the SPM. Moreover, as the present psychometric data are based on children (ages 6 years, 8 months-13 years, 10 months) from one geographic location, more work is needed to determine whether the present positive findings 'The SANTRA is available upon request for the cost of materials to qualified psychological researchers who agree to use the alternate form for research purposes only. generalize to adults and to other populations. In addition, more work is needed to determine whether the SANTRA can be used to predict other criterion variab1es.l