The Bell Curve: Intelligence and Class Structure in American Life (98 page)

Read The Bell Curve: Intelligence and Class Structure in American Life Online

Authors: Richard J. Herrnstein,Charles A. Murray

Tags: #History, #Science, #General, #Psychology, #Sociology, #Genetics & Genomics, #Life Sciences, #Social Science, #Educational Psychology, #Intelligence Levels - United States, #Nature and Nurture, #United States, #Education, #Political Science, #Intelligence Levels - Social Aspects - United States, #Intellect, #Intelligence Levels

BOOK: The Bell Curve: Intelligence and Class Structure in American Life
11.29Mb size Format: txt, pdf, ePub
 

Why are there two lines? Recall that the best-fitting line is the one that minimizes the aggregated distances between the data points and the line. For standardized measurements, it makes no difference whether the distances are measured along the pounds axis or the inches axis; for unstandardized measurements, it may make a difference. Hence we may get two lines, depending on which axis was used to fit the line. The two lines, which always intersect at the average values for the two variables, answer different questions. One answers the question we first posed: How much of a difference in pounds is associated with a given difference in inches (i.e., the regression of weight on height). The other one tells us how much of a difference in inches is associated with a given difference in pounds (i.e., the regression of height on weight).

Multiple Regression
 

Multiple regression analysis is the main way that social science deals with the multiple relationships that are the rule in social science. To get a fix on multiple regression, let us return to the high school gym for the last time. Your classmates are still scattered about the floor. Now imagine a pole, erected at the intersection of 60 inches and 90 pounds, marked in inches from 18 inches to 50 inches. For some inscrutable reason, you would like to know the impact of both height and weight on a boy’s waist size. Since imagination can defy gravity, you ask each boy to levitate until the soles of his shoes are at the elevation-that reads on the pole at the waist size of his trousers. In general, the taller and heavier boys must rise the most, the shorter and slighter ones the least, and most boys, middling in height and weight, will have middling waist sizes as well. Multiple regression is a mathematical procedure for finding that plane, slicing through the space in the gym, that minimizes the aggregated distances (in this instance, along the waist size axis) between the bottoms of the boys’ shoes and the plane.

The best-fitting plane will tilt upward toward heavy weights and tall heights. But it may tilt more along the pounds axis than along the inches axis, or vice versa. It may tilt equally for each. The slope of the tilt along each of these axes is again a regression coefficient. With two variables predicting a third, as in this example, there are two coefficients. One of them tells us how much of an increase in trouser waist size is associated with a given increase in weight, holding height constant; the other, how
much of an increase in trouser waist size is associated with a given increase in height, holding weight constant.

With two variables predicting a third, we reach the limit of visual imagination. But the principle of multiple regression can be extended to any number of variables. Income, for example, may be related not just to education but also to age, family background, IQ, personality, business conditions, region of the country, and so on. The mathematical procedures will yield coefficients for each of them, indicating again how much of a change in income can be anticipated for a given change in any particular variable, with all the others held constant.

Logistic Regression
 

The text frequently resorts to a method of analysis called
logistic regression.
Here, we need only say what the method is for rather than what it is. Many of the variables we discuss are such things as being unemployed or not, being married or not, being a parent or not, and so on. Because they are measured in two values—corresponding to yes and no—they are called binary variables. Logistic regression is an adaptation of ordinary regression analysis tailored to the case of binary variables. (It can also be used for variables with larger numbers of discrete values.) It tells us how much change there is in the probability of being unemployed, married, and so forth, given a unit change in any given variable, holding all other variables in the analysis constant.

Appendix 2
Technical Issues Regarding the National Longitudinal Survey of Youth
 

This appendix provides details about the variables used in the text and about other technical issues associated with the NLSY.
1
Colleagues who wish to recreate analyses will need additional information, which may be obtained from the authors.
2

SURVEY YEAR, CONSTANT DOLLARS, AND SAMPLE WEIGHTS
 

Our use of the NLSY extends through the 1990 survey year.
3

All dollar figures are expressed in 1990 dollars, using the consumer price index inflators as reported in the 1992 edition
of Statistical Abstract of the United States,
Table 737.

Sample weights were employed in all analyses in the main text. We do not so note in each instance, to simplify the description. In computing scores that were based on the 11, 878 subjects who had valid scores on the Armed Forces Qualification Test (AFQT), we used the sampling weights specifically assigned for the AFQT population. For analyses based on the NLSY subjects’ status as of a given year (usually 1990), we used the sampling weights for that survey year. For analyses in which the children of NLSY women were the unit of analysis, the child’s sampling weights were used rather than the mother’s.

To make interpretation of the statistical significance easier, we replicated all the analyses in Part II using just the unweighted cross-sectional sample of whites, as reported in Appendix 4.

SCORING OF THE ARMED FORCES QUALIFICATION TEST (AFQT)
 

The AFQT is a combination of highly g-loaded subtests from the Armed Services Vocational Aptitude Battery (ASVAB) that serves as the armed services’ measure of cognitive ability, described in detail in Appendix 3. Until 1989, the AFQT consisted the summed raw scores of the ASVAB’s arithmetic reasoning, word knowledge, and paragraph comprehension subtests, plus half of the score on numerical operations subtest. In 1989, the armed forces decided to rescore the AFQT so that it consisted of the word knowledge, paragraph comprehension, arithmetic reasoning, and mathematics knowledge subtests. The reason for the change was to avoid the numerical operations subtest, which was both less highly
g-
loaded than the mathematics knowledge subtest and sensitive to small discrepancies in the time given to subjects when administering the test (numerical operations is a speeded test in which the subject completes as many arithmetic problems as possible within a time limit).

A draft of
The Bell Curve
was well underway when we became aware of the 1989 scoring scheme. We completed a full draft using the 1980 scoring system but decided that the revised scoring system was psychometrically superior to the old one and therefore replicated all of the analyses using the 1989 version.

Scholars who wish to replicate our analyses should note that the 1989 AFQT score as reported in the NLSY database is
not
the one used in the text. The NLSY’s variable is rounded to the nearest whole centile and based on the 18-to 23-year-old subset of the NLSY sample. We recomputed the AFQT from scratch using the raw subtest scores, and the population mean and standard deviation used in producing the across-ages AFQT score was based on all 11,878 subjects, not just those ages 18 to 23.
4
This measure is useful for multivariate analyses in which age is also entered as an independent variable but should not be used (and is never used in the text) as a representation of an individual subject’s cognitive ability because of age-related differences in test scores (see discussion below).

Age
 

AFQT scores in the NLSY sample rose by an average of .07 standard deviations per year. The simplest explanation for this is that the AFQT
was designed by the military for a population of recruits who would be taking the test in their late teens, and younger subjects in the NLSY sample got lower scores for the same reason that high school freshmen get lower SAT scores than high school seniors. However, a cohort effect could also be at work, whereby (because of educational or broad environmental reasons) youths born in the first half of the 1960s had lower realized cognitive ability than youths born in the last half of the 1950s. There is no empirical way of telling which reason really explains the age-related differences in the AFQT or what the mix of reasons might be. The age-related increase is not perfectly linear (it levels off in the top two years) but close enough that the age problem is best handled in the multivariate analyses by entering the subject’s birthdate as an independent variable (all the NLSY sample took the AFQT within a few months of each other in late 1980).

For all analyses
except
the multivariate regression analyses, we use age-equated scores. These were produced by using the sample weight as a frequency, then preparing separate distributions by birth year, expressed in centiles.
5
Each subject’s rank in that population (mathematically, the “population” is the sum of the sample weights for that birth year) was divided by the population to obtain the centile where that subject fell within his birth year cohort.
6

That AFQT scores vary according to education raises an additional issue: To what extent is the AFQT a measure of cognitive ability, and not just length and quality of education? We explore this issue at length in Appendix 3.

Skew
 

The distribution of the AFQT in either of its versions is skewed so that the high scores tend to be more closely bunched than the low scores. To put it roughly, the most intelligent people who take the test have less of an opportunity to get a high score than the least intelligent people have to get a low score. One effect is to limit artificially the maximum size of a standardized score. It is artificial because the AFQT does in fact discriminate reasonably well at the high end of the scale. For example, only 22 youths out of 11,878 in the NLSY with valid AFQT scores earned perfect scores on the subtests, representing 0.253 percent of the national population of their age (using sampling weights). In a test with a normal distribution, those youths would have had a standardized score
of 2.80. But given the skew in the NLSY, it is impossible for anyone to have a standardized score higher than 1.66. The standard deviation for a high-scoring group is similarly squeezed.

A certain amount of skew is not a concern for many kinds of analysis. For the analyses in
The Bell Curve,
however, the difference between two groups is often expressed in terms of standard deviations, and the size of that difference was likely to be affected by skew.

We therefore computed standardized scores corrected for skew, first by computing the centile scores for the NLSY population, using sample weights as always, then assigning to each subject the standardized score corresponding to that centile in a normal distribution. We did this for both the old and new versions of the AFQT. Following armed forces’ convention, all scores greater or smaller than 3 standard deviations from the mean were set at 3 standard deviations (this affected only a small number of scores at the low end of the distribution).

The effects of correcting for skew were noticeable when expressing differences between groups. For example, for the most sensitive group comparison, between ethnic groups, the results are shown in the following table. As always when full information about means, standard deviations, and sample sizes is available, the group differences are computed using the weighted average of the groups’ standard deviations. The equation is given in note 25 for Chapter 13. The primary effect of the skew was to squeeze the standard deviation of the higher-scoring group (whites) and, in comparison, elongate the standard deviation of the lower scoring groups. Correcting for skew thus shrank both the black-white and Latino-white differences. The same phenomenon affected all comparisons involving subgroups with markedly different AFQT means. All standardized AFQT scores, for both the regression analyses and the age-equated scores, are therefore corrected for skew. In other words, each represents the standardized score in a normal distribution that corresponds to the (unrounded) centile score of the subject in the observed distribution.

Comparison of Two Versions of The AFQT, Uncorrected and Corrected for Skew
Version of the AFQT
Corrected for Skew?
Black
Latino
White
Black/ White Difference
Latino/ White Differences
 
 
Mean
SD
Mean
SD
Mean
SD
Pre-1989
No
−.97
.91
−.67
1.01
.24
.88
1.36
1.02
 
Yes
−.90
.81
−.64
.93
.23
.92
1.25
.94
1989 revision
No
−.93
.87
−.67
.98
.23
.90
1.30
.99
 
Yes
−.88
.83
−.64
.94
.22
.92
1.21
.93

The effects of the different scoring methods on ethnic differences raise a larger question that we should answer directly: How would the results presented in this book be different if we had used the 1980 version of the AFQT instead of the 1989 version? If we had not corrected for skew instead of correcting for skew? For most analyses, the answer is that the results are unaffected. But it may also be said that whenever differences were found, the scoring procedure we used tended to produce smaller relationships between IQ and the indicators, and smaller ethnic differences, than the alternatives. We did not compute every analysis by each of the four scoring permutations, but we did replicate all of the analyses using the two extremes (1980 version uncorrected for skew and the 1989 version corrected for skew). In no instance did the 1989 version corrected for skew—the version reported in the text—yield significant findings that were not also found when using the 1980 uncorrected version. In terms of the relationships explored in this book, the 1989 version corrected for skew is the most conservative of the alternatives.

Other books

Double Blind by Vanessa Waltz
Aftershock by Sandy Goldsworthy
Blood and Iron by Harry Turtledove
Sunrise at Sunset by Jaz Primo
Alphabet House by Adler-Olsen, Jussi