Statistics Essentials For Dummies (29 page)

Read Statistics Essentials For Dummies Online

Authors: Deborah Rumsey

Tags: #Reference

BOOK: Statistics Essentials For Dummies
2.09Mb size Format: txt, pdf, ePub

If one or both of the sample sizes are small (less than 30) you use the appropriate value on the
t
-distribution with
n
1
+
n
2
- 2 degrees of freedom instead of
z*
(see Table A-2 in the appendix).

Suppose you want to estimate with 95% confidence the difference between the mean (average) lengths of cobs from two varieties of sweet corn (allowing them to grow the same number of days under the same conditions). Call the two varieties Corn-e-stats and Stats-o-sweet.

Suppose your random sample of 100 cobs of the Corn-e-stats variety averages 8.5 inches, with a standard deviation of 2.3 inches, and your random sample of 110 cobs of Stats-o-sweet averages 7.5 inches, with a standard deviation of 2.8 inches. That is,
= 8.5,
s
1
= 2.3, and
n
1
= 100 from the Corn-e-stats; and
= 7.5,
s
2
= 2.8, and
n
2
= 110 from the Stats-o-sweet.

Notice the population standard deviations are unknown; when this is the case you substitute the appropriate value from the
t-
distribution with
n1
+
n2
- 2 degrees of freedom for
z*
. In this case the degrees of freedom are 100 + 110 - 2 = 208; with this many degrees of freedom, the
t-
and
Z
-distributions are approximately equal (see Chapter 9), and we use 1.96 for the appropriate value of
t
anyway (see last row of Table A-2 in the appendix).

The difference between the sample means
is 8.5 - 7.5 = +1 inch. The average for Corn-e-stats minus the average for Stats-o-sweet is positive, making Corn-e-stats the larger of the two varieties, in terms of this sample. Is that difference enough to generalize to the entire population, though? That's what this confidence interval is going to help you decide.

To calculate the margin of error, square
s
1
(2.3) to get 5.29 and divide by 100 to get 0.0529; then square
s2
(2.8) and divide by 110 to get 7.84/110 = 0.0713. The sum is 0.0529 + 0.0713 = 0.1242; the square root is 0.3524. Multiply 1.96 times 0.3524 to get 0.69 inches, the margin of error.

Your 95% confidence interval for the difference between the average lengths for these two varieties of sweet corn is 1 inch, plus or minus 0.69 inches. (The lower end of the interval is 1 - 0.69 = 0.31 inches; the upper end is 1 + 0.69 = 1.69 inches.) You conclude that the cobs of the Corn-e-stats variety are longer, on average, than the Stats-o-sweet variety, by between 0.31 and 1.69 inches, with a 95% level of confidence.

Notice all the values in this interval are positive. That's why you conclude one brand is longer than the other (according to your data). If some of the values in the confidence interval were positive and some were negative, you wouldn't conclude one was longer than the other on average.

Also note that there is a difference between the "difference in the means" and the "mean of the differences." If you're looking at pairs of data (such as pre-test versus post-test) and are examining the differences, you only have one data set and one population. Use the methods in the "Confidence Interval for a Population Mean" section to find a confidence interval for the "mean difference." If you're examining the difference in the means of two separate populations (such as males versus females) use the methods in this section to find a confidence interval for the "difference of two means."

Notice that you could get a negative value for
. For example, if you had switched the two varieties of corn, you would have gotten -1 for this difference. That's fine; just remember which group is which. A positive difference means the first group has a larger value than the second group; a negative difference means the first group has a smaller value than the second group. If you want to avoid negative values, always make the group with the larger value your first group — all your differences will be positive.

Confidence Interval for the Difference of Two Proportions

When two populations are compared regarding some categorical variable (such as comparing males to females regarding their opinion of a four-day work week) you estimate the difference between the two population proportions. You do this by taking the difference in their corresponding sample proportions (one from each population) plus or minus a margin of error. The result is called a
confidence interval for the difference of two population proportions, p
1
- p
2
.

The formula for a confidence interval for the difference between two population proportions is:

Other books

Resolution (Saviour) by Jones, Lesley
Destination Unknown by Katherine Applegate
The Secret Supper by Javier Sierra
Astrosaurs 3 by Steve Cole
Hot Rocks by Rawls, Randy