5.1.2 Worked Examples


Example 5.1. Converting Observations to Z-scores
Let’s return to the data from example 2. Remember that the mean is 6 and the standard deviation (example 3) is approximately 1.852. To calculate the z-scores, we take the observation, subtract the mean, and divide the result by the standard deviation. A few of these are done for you. Fill in the rest of the table on your own.

      x1 - ¯x   3 - 6     - 3                  x11 - ¯x    7 - 6     1
z1 =  -------= ------=  ------≈ - 1.620 z11 = --------=  ------= ------≈  0.540
       Sx      1.852    1.852                    Sx      1.852   1.852

















i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
















x i 3 3 4 5 5 5 6 6 6 7 7 8 8 8 9
















xi -x -3 -3 -2 -1 -1 -1 0 0 0 1 1 2 2 2 3
















zi -1.6 0.5
















From this, we see that the smallest observations in the data (x1 and x2) are between 1 and 2 standard deviations below the mean, since the z-scores for these observations are between -1 and -2. What do you predict will happen when you add all the z-scores up? Why? Is this generally true, or special for this set of data?

Is this data from a normal distribution? That’s harder to answer. We have so few data points, that we will have a hard time matching the rules of thumb exactly. If you’ve calculated all the z-scores correctly then you should get 8 out of 15 observations with z-scores from -1 to +1. This accounts for 8/15 = 53.33% of the data, which is a little short of expectation. If this were a normal distribution, we would expect closer to 68% of 15 = 0.68 * 15 = 10.2 observations in this range. We have 15/15 = 100% of the data with z-scores between -2 and +2, which is slightly higher than the 95% expectation from the rules of thumb, but 95% of 15 = 0.95*15 = 14.25, so we are close to the right number of observations. Overall, this data does not appear to be from a normal distribution. However, to really see whether it is from a normal distribution, we need to apply some other mathematical tools. It’s entirely possible that this data really is normally distributed. We simply can’t tell with the tools we currently have, but we can make a guess that the data is not normally distributed: Specifically, there are not enough observations near the mean to make it normal.


Example 5.2. Converting from Z-scores to back to the Data
Notice that z-scores, regardless of the units of the original data, have no units themselves. This is because units cancel out. Suppose we have data measured in dollars. Thus, the units for xi, x, the deviations from the mean, and σx are all in dollars. When we compute the z-scores, the units vanish:

          deviation        dollars
zi = ------------------- = -------=  no units
     standard  deviation    dollars

This is the power of the z-scores: they are dimensionless, so they can be used to compare data with completely different units and sizes. Everything is placed onto a standard measuring stick that is ”one standard deviation long” no matter how big the standard deviation is for a given set of data.

But suppose all we know is that a particular observation has a z-score of 1.2. What is the original data value? We know it is 1.2 standard deviations above the mean, so if we take the mean and add 1.2 standard deviations, we’ll have the original observation. So, in order to answer this question about a specific observation we need to know two things about the entire set of data: the mean and the standard deviation. So, if the data represents scores on a test in one class, and the class earned a mean score of 55 points with a standard deviation of 8 points, then a student with a standard score of 1.2 would have a real score of 55 points + 1.2(8 points) = 55 points + 12 points = 67 points. If a student in another class had a z-score of 1.5, then we know that the second student did better compared to his/her class than the first student did, because the z-score is higher. Even if the second student is in a class with a lower mean and lower standard deviation, the second student performed better relative to his/her classmates than the first student did.

On standardized tests, your score is usually reported as a type of z-score, rather than a raw score. All you really know is where your score sits relative to the mean and spread of the entire set of test-takers.


Example 5.3. Checking Rules of Thumb for the Sales Data
Does our sales data from the Cool Toys for Tots chain follow the rules of thumb for normally distributed data? (See the new version of the data in C05 Tots.xls [.rda].) We can ask this question from two different perspectives: by regional comparison and by comparison across the entire chain. Let’s just look at the chain as a whole and include both the northeast and north central regions. First, we’ll compute the z-scores for the stores in the chain in order to convert all the data to a common ruler. For this, we’ll need the standard deviation and mean of the chain, rather than the individual means and standard deviations for each region. The final result is shown in the table below:





Store Region Sales Sales Z




1 NC $668,694.31 2.0100
2 NC $515,539.13 1.1483
3 NC $313,879.39 0.0138
4 NC $345,156.13 0.1898
5 NC $245,182.96 -0.3727
6 NC $273,000.00 -0.2162
7 NC $135,000.00 -0.9926
8 NC $222,973.44 -0.4977
9 NC $161,632.85 -0.8428
10 NC $373,742.75 0.3506
11 NC $171,235.07 -0.7887
12 NC $215,000.00 -0.5425
13 NC $276,659.53 -0.1956
14 NC $302,689.11 -0.0492
15 NC $244,067.77 -0.3790
16 NC $193,000.00 -0.6663
17 NC $141,903.82 -0.9538
18 NC $393,047.98 0.4592
19 NC $507,595.76 1.1036
20 NE $95,643.20 -1.2140
21 NE $80,000.00 -1.3020
22 NE $543,779.27 1.3072
23 NE $499,883.07 1.0603
24 NE $173,461.46 -0.7762
25 NE $581,738.16 1.5208
26 NE $189,368.10 -0.6867
27 NE $485,344.87 0.9785
28 NE $122,256.49 -1.0643
29 NE $370,026.87 0.3297
30 NE $140,251.25 -0.9630
31 NE $314,737.79 0.0186
32 NE $134,896.35 -0.9932
33 NE $438,995.30 0.7177
34 NE $211,211.90 -0.5638
35 NE $818,405.93 2.8523




To check the rules of thumb, we need to determine how many stores fall into each of the breakdowns by using the z-scores.

Overall, this data is fairly close to satisfying the rules of thumb for being normally distributed. There seems to be one too many observations within one standard deviation of the mean, but that is generally acceptable. (Note: 71.43% -68% = 0.0343% and 0.0343% of 35 = 1.2.) With only 35 data points, these results are very close to what one would expect from normally distributed data. Is the data symmetric? If it is, there should be roughly the same number of stores above the mean as there are below the mean.