Statistics Essentials For Dummies (55 page)

Read Statistics Essentials For Dummies Online

Authors: Deborah Rumsey

Tags: #Reference

BOOK: Statistics Essentials For Dummies

7.76Mb size Format: txt, pdf, ePub

Misinterpreted Correlations

Correlation is one of the most misunderstood and misused statistical terms used by researchers, the media, and the general public. (You can read all about this in Chapter 10.) Here are my three major correlation pet peeves:

Correlation applies only to two
numerical
variables, such as height and weight.
So, if you hear someone say, "It appears that the voting pattern is correlated with gender," you know that's statistically incorrect. Voting pattern and gender may be associated, but they can't be correlated in the statistical sense.

Correlation measures the strength and direction of a
linear
relationship.
If the correlation is weak, you can say there is no linear relationship; however some other type of relationship might exist, for example, a curve (such as supply and demand curves in economics).

Correlation doesn't imply cause and effect.
Suppose someone reports that the more people drink diet cola, the more weight they gain. If you're a diet cola drinker, don't panic just yet. This may be a freak of nature that someone stumbled onto. At most, it means more research needs to be done (for example, a well-designed experiment) to explore any possible connection.

Confounding Variables

Suppose a researcher claims that eating seaweed helps you live longer; you read interviews with the subjects and discover that they were all over 100, ate very healthy foods, slept an average of 8 hours a day, drank a lot of water, and exercised. Can we say the long life was caused by the seaweed? You can't tell, because so many other variables exist that could also promote long life (the diet, the sleeping, the water, the exercise); these are all confounding variables.

A common error in research studies is to fail to control for confounding variables, leaving the results open to scrutiny. The best way to head off confounding variables is to do a well-designed experiment in a controlled setting.

Observational studies are great for surveys and polls, but not for showing cause-and-effect relationships, because they don't control for confounding variables. A well-designed experiment provides much stronger evidence. (See Chapter 13.)

Botched Numbers

Just because a statistic appears in the media doesn't mean it's correct. Errors appear all the time (by error or design), so look for them. Here are some tips for spotting botched numbers:

Make sure everything adds up to what it's reported to.
With pie charts, be sure the percentages add up to 100% (or very close to it — there may be round-off error).

Double-check even the most basic of calculations.
For example, a chart says 83% of Americans are in favor of an issue, but the report says 7 out of every 8 Americans are in favor of the issue. 7 divided by 8 is 87.5%.

Look for the response rate of a survey — don't just be happy with the number of participants.
(The response rate is the number of people who responded divided by the total number of people surveyed times 100%.) If the response rate is much lower than 70%, the results could be biased, because you don't know what the nonrespondents would have said.

Question the type of statistic used to determine if it's appropriate.
For example, the number of crimes went up, but so did population size. Researchers should have reported crime rate (crimes per capita) instead.

Statistics are based on formulas and calculations that don't know any better — the people plugging in the numbers should know better, though, but sometimes they either don't know better or they don't want you to catch on. You, as a consumer of information (also known as a certified skeptic), must be the one to take action. The best policy is to ask questions.

Selectively Reporting Results

Another bad move is when a researcher reports a "statistically significant" result but fails to mention that he found it among 50 different statistical tests he performed — the other 49 of which were
not
significant. This behavior is called
data fishing
, and that is not allowed in statistics. If he performs each test at a significance level of 0.05, that means he should expect to "find" a result that's not really there 5 percent of the time just by chance (see Chapter 8 for more on Type I errors). In 50 tests, he should expect at least one of these errors, and I'm betting that accounts for his one "statistically significant" result.

Other books

AwayTeam by Mark Alders

The Glass Casket by Templeman, Mccormick

The Prisoner of Heaven: A Novel by Carlos Ruiz Zafon

Spencer-3 by Kathi S Barton

Still Missing by Chevy Stevens

Out of the Blue by Alan Judd

The Last Werewolf Bride Complete Trilogy by Sage Domini

The Eden Tree by Malek, Doreen Owens

Chamber Music by Doris Grumbach

Never Alone by Elizabeth Haynes