Statistics Essentials For Dummies (5 page)

Read Statistics Essentials For Dummies Online

Authors: Deborah Rumsey

Tags: #Reference

BOOK: Statistics Essentials For Dummies
7.61Mb size Format: txt, pdf, ePub
 

When it comes to measures of center, the average doesn't always tell the whole story and may be a bit misleading. Take NBA salaries. Every year, a few top-notch players (like Shaq) make much more money than anybody else. These are called
outliers
(numbers in the data set that are extremely high or low compared to the rest). Because of the way the average is calculated, high outliers drive the average upward (as Shaq's salary did in the preceding example). Similarly, outliers that are extremely low tend to drive the average downward.

What can you report, other than the average, to show what the salary of a "typical" NBA player would be? Another statistic used to measure the center of a data set is the median. The
median
of a data set is the place that divides the data in half, once the data are ordered from smallest to largest. It is denoted by M or
. To find the median of a data set:

1. Order the numbers from smallest to largest.

 

2. If the data set contains an odd number of numbers, the one exactly in the middle is the median.

 

3. If the data set contains an even number of numbers, take the two numbers that appear exactly in the middle and average them to find the median.

 

For example, take the data set 4, 2, 3, 1. First, order the numbers to get 1, 2, 3, 4. Then note this data has an even number of numbers, so go to Step 3. Take the two numbers in the middle — 2 and 3 — and find their average: 2.5.

Note that if the data set is odd, the median will be one of the numbers in the data set itself. However, if the data set is even, it may be one of the numbers (the data set 1, 2, 2, 3 has median 2); or it may not be, as the data set 4, 2, 3, 1 (whose median is 2.5) shows.

Which measure of center should you use, the mean or the median? It depends on the situation, but reporting both is never a bad idea. Suppose you're part of an NBA team trying to negotiate salaries. If you represent the owners, you want to show how much everyone is making and how much you're spending, so you want to take into account those superstar players and report the average. But if you're on the side of the players, you want to report the median, because that's more representative of what the players in the middle are making. Fifty percent of the players make a salary above the median, and 50% make a salary below the median.

When the mean and median are not close to each other in terms of their value, it's a good idea to report both and let the reader interpret the results from there. Also, as a general rule, be sure to ask for the median if you are only given the mean.

Measures of Variability

Variability is what the field of statistics is all about. Results vary from individual to individual, from group to group, from city to city, from moment to moment. Variation always exists in a data set, regardless of which characteristic you're measuring, because not every individual will have the same exact value for every characteristic you measure. Without a measure of variability you can't compare two data sets effectively. What if in both sets two sets of data have about the same average and the same median? Does that mean that the data are all the same? Not at all. For example, the data sets 199, 200, 201, and 0, 200, 400 both have the same average, which is 200, and the same median, which is also 200. Yet they have very different amounts of variability. The first data set has a very small amount of variability compared to the second.

By far the most commonly used measure of variability is the standard deviation. The
standard deviation
of a data set, denoted by
s
, represents the typical distance from any point in the data set to the center. It's roughly the average distance from the center, and in this case, the center is the average. Most often, you don't hear a standard deviation given just by itself; if it's reported (and it's not reported nearly enough) it's usually in the fine print, in parentheses, like "(
s
= 2.68)."

The formula for the standard deviation of a data set is

. To calculate
s
, do the following steps:

1. Find the average of the data set,
.

 

To find the average, add up all the numbers and divide by the number of numbers in the data set,
n
.

 

2. For each number, subtract the average from it.

 

3. Square each of the differences.

 

4. Add up all the results from Step 3.

 

5. Divide the sum of squares (Step 4) by the number of numbers in the data set, minus one (
n
- 1).

 

If you do Steps 1 through 5 only, you have found another measure of variability, called the
variance
.

 

6. Take the square root of the variance. This is the standard deviation.

 

Suppose you have four numbers: 1, 3, 5, and 7. The mean is 16 ÷ 4 = 4. Subtracting the mean from each number, you get (1 - 4) = -3, (3 - 4) = -1, (5 - 4) = +1, and (7 - 4) = +3. Squaring the results you get 9, 1, 1, and 9, which sum to 20. Divide 20 by 4 - 1 = 3 to get 6.67. The standard deviation is the square root of 6.67, which is 2.58.

 

Here are some properties that can help you when interpreting a standard deviation:

The standard deviation can never be a negative number.

 

The smallest possible value for the standard deviation is 0 (when every number in the data set is exactly the same).

 

Standard deviation is affected by outliers, as it's based on distance from the mean, which is affected by outliers.

Other books

Walking Dead by Greg Rucka
The Buck Passes Flynn by Gregory Mcdonald
Collected Kill: Volume 1 by Patrick Kill
Finton Moon by Gerard Collins
Retribution by Adrian Magson
No Decent Gentleman by Grasso, Patricia;
No Ordinary Affair by Fiona Wilde, Sullivan Clarke
Not Without You by Harriet Evans
Leaving Unknown by Kerry Reichs