5.1 Getting the Data to Fit a Common Ruler

Well, now we have a tool for measuring the spread of a set of data. This measuring stick, the standard deviation, is a useful tool for looking at the entire set of data. Notice that we are slowly building up more information about the data as a whole: We started with thousands of observations. Then we reduced this down to a single statistic, the mean, which measures the typical data point. Next we added a little more information by using a boxplot, which really contains seven pieces of information (minimum, first quartile, median, mean, third quartile, maximum, outliers). Then we added the standard deviation to our arsenal, giving us quite a bit of information about the typical observations and how the rest of the data is spread out.

However, these tools really only help us look at a single variable in a set of data. The measuring stick for determining the spread of the data is different for every set of data. In essence, the standard deviation is a ”ruler made of rubber”; it stretches to measure the spread of data that has a large range, and it contracts to measure the spread of data with a small range. What if we want to compare two sets of data? Better yet, what if we want to compare individual observations from two different variables in our data? How can we do this when all our tools are designed to change to fit the data? Is there no standard?

As a matter of fact, there is a standard ruler that applies to all data, regardless of its size or units. Each observation in a set of data can be converted to what is called a standard score, also known as a z-score. This converts all data to a dimensionless number on a common ruler. Once this is done, we can compare z-scores for observations from different variables and we can determine which observation is farther away from the mean in an absolute sense.

It is important to realize that the concept of a z-score is fundamentally different from the other statistics we have discussed so far. The mean, the standard deviation, and the statistics shown on a boxplot are descriptive statistics for an entire set of data. On the other hand, each observation has its own z-score; thus, z-scores are more individual. At the same time, a z-score is a comparative number. Z-scores show you how a particular observation compares to the entire data set. Essentially, a z-score is a number that tells you how many standard deviations (or fractions of a standard deviation) an observation is from the mean of the data.

  5.1.1 Definitions and Formulas
  5.1.2 Worked Examples
  5.1.3 Exploration 5A: Cool Toys for Tots