THE 1937 SCALE
The 1937 scale extended the age range down to the 2-year-old level. Also, by adding new tasks, developers increased the maximum possible mental age to 22 years, 10 months. Scoring standards and instruction were improved to reduce ambiguities, enhance the standardization of administration, and increase interscorer reliability. Furthermore, several performance items, which required the subject to do thungs such as copy designs, were added to decrease the scale’a emphasis on verbal skills. However, only some 25% of the items were nonverbal, so the test was not balanced between the two types of times (Becker, 2003).
The standardization sample was markedly improved. Whereas the 1916 norms were resticted to Californians, the new subject for the 1937 stanford-Binet standization sample came from 11 U.S. states representing a variety of region. Subjects were selected according to their fathers occupations. In addition, the standardization sample was substantially increased. Unfortunatelly, the sample included only whites and more urban subjects than a threefold increase from the 1916 scale and was more than 63 times larger than the original sample of the 1905 scale.
Perhaps the most important improvement in the 1937 version was the inclusion of an alternate equivalent form. Forms L and M were designed to be equivalent in terms of both difficulty and content. With two such forms, the psycometric properties of the scale sould be readily examined (see figure 9.5).
Problems with the 1937 scale
A major problem with the 1937 scale scale was that its reliability coefficients were higher for older subjects than for younger ones. Thus, results for the latter were not as stable as those for the former. Reliability figures also varied as a function of IQ level, with higher reliabilities in the lower IQ ranges (i.e., less than 70) and poorer ones in the higher ranges. The lowest reliabilities occured in the youngest age groups in the highest IQ ranges. These findings apply generally to all modern intelligence test: scores are most unstable for young children in high IQ ranges.
Along with the differing reliabilities, each age group in the standardization sample produced a enique standard deviation of IQ score. This differential variability in IQ scores as a function of age created the single most important problem in the 1937 scale. More specifically, despite the great care taken in selecting the standardization sample, different age groups showed significant differences in the standard deviation of IQ scores. For example, the standard deviation in the IQ scores at age 6 was approximately 12.5. the standard deviations at ages 2.5 and 12, on the other hand, were 20.6 and 20.0, respectively. Because of these discrepancies, Iqs at one age level were not equivalent to Iqs at another (see focused example 9.1).
THE 1960 STANDFORD-BINET REVISION AND DEVIATION IQ (SB-LM)
The developer of the 1960 revision (SB-LM) tried to create a single instrument by selecting the best from the forms of the 1937 scale. Tasks that showed an increase in the precentage passing with an increase in age – a main criterion and guiding principle for the construction of the Binet scale – received the highest priority, as did tasks that correlatied higly with scores as a whole – a second guiding principle of the binet scale. In addition, instruction for scoring and test administration were improved, and IQ tables were extended from age 16 to 18. Perhaps most important, tha problem of differentian variation in Iqs was solved by the deviation IQ concept.
As used in the standford-Binet scale, the deviation IQ was simply a standard score with a mean of 100 and a standard deviation of 16 (today the standard deviation is set at 15). With the mean set at 100 and assigned to score at the 50th percentile, the deviation IQ was ascertained by evaluating the standard deviation of mental age for a representative sample at each age level. New IQ tables were then constructed that corred for differences in variability, one could compare the Iqs of one age level with those of another. Thus, score could be interpreted in terms of standard deviations and percentile. Today, the deviation IQ method is considered the most precise way of expressing the results of an intelligence test (see figure 9.6).
The 1960s revision did not include a new normative sample or restandardization. However, by 1972, a new standardization group consiting of a representative sample of 2100 children (approximarely 100 at each standford-Binet age level) had been obtained for use with the 1906 revision (thorndike, 1973). Unlike all previous norms, the 1972 norms included nonwhites. For many, however, the general improvements in tha 1960 revision, even with the new 1972 norm, did not suffice. In 1986, a new and drastically revised version of the binet scale was published (thorndike, hagen, & salter, 1986). Then, in 2003, there was another major revision inwich many of the concepts added to the 1986 edition were abandonde in favor of concepts used in the 1960 (SB-LM) version. The changers in 1986 and the possible reasons for the return to the older 1960 model are instructive.
The modern binet scale
Our discussion of the evolution of the binet scale has illustrated many of the concepts that have dominated intelligence testing from its inception to the present. The fourth dan fifth editions of the stanford-binet scale continue this tradition of innovation an incorporation of central psychometric and theoretrical concepts. In this section, we examine the fourth and fifth editions of the scale, wich its authors developed in response to cultural and social cahnges and new research in cognitive psychology. First, we consider the basic model that guided this development and briefly discuss the features common to both editions. Next, we compare these latest editions to their predecessors. We begin with a brief look at how the fourt editions was changed. Then we consider the 2003 edition in greater detail- the various subtests, summary scores, and procedures. We also examine the scale’s psycometric properties. Finally, we axamine the modern 2003 edition of the binet in light of a relatively new theory of intelligence.
Model for the fourt and fifth editions of the binet scale
The model for the latest editions of the binet (figure 9.7) is far more elaborate than the spearman model that best characterized the original versions of the scale. These versions incorporate the gf-gc theory of intelligence. They are based on a hierarchical model. At the top of the hierarchy is g (general inteligence), wich reflect the common variability of all tasks. At the next level are three group factors, crystalized abilities reflect learning-the relization of original potential throught experience. Fluid-analityc abilities represent original potential, or the basic capabilities that a person uses to acquire crystallized abilities (horn, 1994; Horn & cattell, 1966; horn & noll, 1997). Short-term memory refers to one’s memory during short inervals-the amount of information one can retain briefly after a single, shor presentation (colom, flores-mendoza,quiroga, & privado, 2005). In addition, crystallized ability has two subcategories: verbal reasoning and nonverbal reasoning (pomplun & custer, 2005)
The role of thurstone’s multidimensional model
The model of the modern binet represents an attempt to place an evaluation of g ini the context of a multidimensional model of intelligence from wich one can evaluate specific ailities. The impetus for a multidimansional model stemmed from the work of thurstone (1938). He argued that, contrary to speraman’s notion of intelligence asa a single process, intelligence as a single procces, intelligence could best be conceptualized as comprising independent factors, or “primary mental abilities.” Years of painstaking work ultimately revealed evidence for group abilities factors that were relatively, but not totally, independent. The group factors were correlated, and from them a g factor could be extracted, as in the hierarchical model of the fourth and fifth editions of the binet.
Caracteristic of the 1986 revision
The 1986 revision attempted to retain all of the strenghts of the earlier revisions while eliminating the weaknesses. This was no easy task; nor was it complete success as indicated by the backtracking that occurred in the fifth edition. To continue to provide a measure of gneral mental ability, the aouthots of the 1986 revision decided to retain the wide variety of contest and task characteristics of earlier versions. However, to avoid having this wide content unevenly distributed across age groups, the age scale format was entirely eliminated. In place of the age scale, items with the same content werw placed together into any one of 15 separate tests to create point scale. For example, all vocabulary items were placed together in one test; all matrix items placed together in a second
The more modern 2003 fifth edition provided a more standardized hierarchical model with five factors, as illustrated in figure 9.8 at the top of the hierarchy is general intelligence, just as in the 1986 edition. However, there are now five rather then four main factors. Each factor, in turn, has an equally weighted nonverbal and verbal measure. Figure 9.9 indicates the types of activities used to measure the various factors.