The following examples deal with data collected at Over Easy using the observational data forms from the last chapter. The data can be found in the file C03 OverEasy2.xls [.rda].
Example 3.5. Reading a one-variable pivot table
Given the data on counts of people at OverEasy for the period shown in the data, it might be nice to see how many people typically sit in each area of the restaurant (Nook, Cranny and Hole). We could go through the data and count these up ourselves, or sort the data and look at it, but in a few seconds, our software can produce a pivot table doing this for us. Moreover, it is then easy to change the way the data is displayed and cross-sectioned. For example, the table below shows the average number of people in each area of the restaurant. From this, we see that one area of the restaurant, the Hole section, has many fewer people in it, on average.
Average of Count | |
Location | Total |
Cranny | 18.604 |
Hole | 12.176 |
Nook | 22.069 |
Grand Total | 17.616 |
However, we do not know how significant these differences are. It would help to know how variable the number of people in each section is. By changing the display to show the standard deviation of the counts of people in each section, we get the table below. Notice that this shows that the overall spread of the data is about 12 people, but that the Hole area is much lower at about 9 people. So not only is the average number of people in that area low, but the spread is much less, indicating that we have consistently fewer people in that area of the restaurant.
StdDev of Count | |
Location | Total |
Cranny | 12.075 |
Hole | 9.042 |
Nook | 12.549 |
Grand Total | 12.046 |
Realistically, though, we should only be looking at the total count of people in each area if the areas are equal in size. Assuming they are, the above analysis holds; if they are not equal, and if, for example, the Hole area is smaller than the other two, then we would of course expect lower numbers in that area. In the case where the numbers have different maximum values, we need to find a number that is more stable across all the areas of the restaurant. For example, from the observational data, we could create a new variable called ”PercentCapacity” that is the actual number of people divided by the maximum capacity of that area in the restaurant. These numbers would all range from 0% to 100%, making them comparable. For now, though, we assume that all the areas are the same size, which leads us to conclude that for some reason, people don’t seem to want to use the Hole area of the restaurant.
To find out why this is we would need to collect more data. Perhaps people do not like the decor in that area. Or maybe it’s farther from the door, so that people fill up from the front of the restaurant on back. Maybe there’s no view out of the windows in that area, or it is too close to the kitchen or bathrooms, making it noisy.
Example 3.6. Pivot tables in two variables
Let us first revisit the example pivot table from the discussion above (table 3.1.) It shows the average number of people in each section of the restaurant, broken down by two variables: day of the week and location. This lets us explore more patterns in the data and see more clearly what is happening.
Notice that the table shows how much lower the couts are in the Hole area - on every day of the week. And for several days (W = Wednesday, H = Thursday, and F = Friday) the counts in the Hole are about 50% of the counts in either of the other areas. We can also see in the last column that our busiest days tend to be S = Saturday and D = Sunday. Using this information, we could more easily plan how many servers to position in each area each day of the week. But these numbers will clearly change. How much can we expect them to change? The table below displays the standard deviation of the counts, broken down by day and location. Overall, we see a lot of variation in the counts.
StdDev of Count | Location | |||
Day | Cranny | Hole | Nook | Grand Total |
D | 16.331 | 12.153 | 16.138 | 15.538 |
F | 11.864 | 6.031 | 12.474 | 12.023 |
H | 10.138 | 5.804 | 12.245 | 11.275 |
M | 10.171 | 7.860 | 10.179 | 9.874 |
S | 11.748 | 9.791 | 12.050 | 11.554 |
T | 10.027 | 7.956 | 9.434 | 9.540 |
W | 7.982 | 6.175 | 11.603 | 10.402 |
Grand Total | 12.075 | 9.042 | 12.549 | 12.046 |
Another way to view the data in a pivot table is to look at the data not as counts (the default, usually useful only if the data variable is categorical) or as averages or standard deviations, but as a percentage. Most commonly, we would represent the data either as a percentage of the row or the column variable. For example, the table below displays the average counts, but as a percentage of the row variable (day of the week) showing us what percentage of our customers are in each of the three areas of the restaurant. Such as view of the data lets us quickly see - on a common scale - which areas are most and least popular.
These percentages might be useful for many things that the raw numbers would not directly show. For example, by looking at who is where through a percentage, we can make reasonable staffing decisions: if 25% of the customers are in the Hole on Thursday, then 25% of our staff should be in that area. In addition, this also puts all the areas on an equal footing.
Sum of Count | Location | |||
Day | Cranny | Hole | Nook | Grand Total |
D | 35.95% | 25.65% | 38.40% | 100.00% |
F | 38.97% | 17.32% | 43.71% | 100.00% |
H | 35.22% | 18.00% | 46.77% | 100.00% |
M | 35.17% | 25.29% | 39.54% | 100.00% |
S | 34.32% | 27.14% | 38.54% | 100.00% |
T | 35.62% | 25.73% | 38.65% | 100.00% |
W | 30.31% | 19.92% | 49.78% | 100.00% |
Grand Total | 35.20% | 23.04% | 41.76% | 100.00% |
Example 3.7. Large pivot tables and grouping
Our data on the counts at Over Easy also includes the time of day. It might be interesting to look at the data using this as one of our variables, but there are a lot of times during the day. A quick two-variable pivot table looking at time and location gives us the rather long table shown below. The length is due to the large number of values for the variable ”Time”. If our variable had been a continuous numerical variable instead, we could have an even bigger table, with one row for each different value of the variable.
Average of Count | Location | |||
Time | Cranny | Hole | Nook | Grand Total |
500 | 3.91 | 2.60 | 5.95 | 4.15 |
530 | 16.69 | 10.84 | 20.01 | 15.85 |
600 | 31.03 | 20.59 | 36.51 | 29.38 |
630 | 32.83 | 22.33 | 36.94 | 30.70 |
700 | 28.43 | 18.73 | 32.63 | 26.60 |
730 | 25.19 | 16.88 | 29.48 | 23.85 |
800 | 16.17 | 10.72 | 19.43 | 15.44 |
830 | 12.51 | 8.36 | 15.90 | 12.26 |
900 | 6.67 | 4.27 | 8.99 | 6.64 |
930 | 5.42 | 3.69 | 7.63 | 5.58 |
1000 | 6.56 | 4.52 | 8.85 | 6.64 |
1030 | 8.71 | 5.67 | 11.19 | 8.52 |
1100 | 16.36 | 10.70 | 19.87 | 15.64 |
1130 | 20.63 | 13.17 | 24.20 | 19.33 |
1200 | 29.29 | 18.77 | 33.52 | 27.19 |
1230 | 31.54 | 19.85 | 36.02 | 29.13 |
1300 | 31.06 | 19.27 | 35.37 | 28.57 |
1330 | 26.52 | 17.85 | 30.84 | 25.07 |
1400 | 3.96 | 2.55 | 6.00 | 4.17 |
Grand Total | 18.60 | 12.18 | 22.07 | 17.62 |
Large tables are generally undesirable. They are harder to read and harder to interpret. Usually, though, when you use a numerical variable as either the column or row variable in a pivot table, there is a way to group the values of the variable to make it easier to read the table. In this case, we could easily group all the times between 500 and 1000 into a ”Breakfast” group all the other times into a ”Lunch” group.
Average of Count | Location | ||||
Time2 | Time | Cranny | Hole | Nook | Grand Total |
Group1 | 500 | 3.91 | 2.60 | 5.95 | 4.15 |
530 | 16.69 | 10.84 | 20.01 | 15.85 | |
600 | 31.03 | 20.59 | 36.51 | 29.38 | |
630 | 32.83 | 22.33 | 36.94 | 30.70 | |
700 | 28.43 | 18.73 | 32.63 | 26.60 | |
730 | 25.19 | 16.88 | 29.48 | 23.85 | |
800 | 16.17 | 10.72 | 19.43 | 15.44 | |
830 | 12.51 | 8.36 | 15.90 | 12.26 | |
900 | 6.67 | 4.27 | 8.99 | 6.64 | |
930 | 5.42 | 3.69 | 7.63 | 5.58 | |
Group2 | 1000 | 6.56 | 4.52 | 8.85 | 6.64 |
1030 | 8.71 | 5.67 | 11.19 | 8.52 | |
1100 | 16.36 | 10.70 | 19.87 | 15.64 | |
1130 | 20.63 | 13.17 | 24.20 | 19.33 | |
1200 | 29.29 | 18.77 | 33.52 | 27.19 | |
1230 | 31.54 | 19.85 | 36.02 | 29.13 | |
1300 | 31.06 | 19.27 | 35.37 | 28.57 | |
1330 | 26.52 | 17.85 | 30.84 | 25.07 | |
1400 | 3.96 | 2.55 | 6.00 | 4.17 | |
Grand Total | 18.60 | 12.18 | 22.07 | 17.62 | |
At first glance, this has not really helped. The table is in fact larger by one column. But by hiding the details of a grouped variable, we can get a much smaller table, letting us quickly compare the morning (Group 1) and noon (Group 2) rushes.
Average of Count | Location | ||||
Time2 | Time | Cranny | Hole | Nook | Grand Total |
Group1 | 17.88 | 11.90 | 21.35 | 17.04 | |
Group2 | 19.40 | 12.48 | 22.87 | 18.25 | |
Grand Total | 18.60 | 12.18 | 22.07 | 17.62 | |
As a final way of looking at this data, consider using the count variable itself as a row variable. The counts in each section range from 0 to 65, leaving us with 66 rows in such a table. But by grouping them (which can easily be done with a numerical variable like ”Count”) we can quickly see how many 30 minute blocks of time (observations) in each area of the restaurant fell into each grouping of the number of patrons at our restaurant.
Count of Count | Location | |||
Count | Cranny | Hole | Nook | Grand Total |
0-9 | 675 | 1030 | 517 | 2222 |
10-19 | 529 | 672 | 464 | 1665 |
20-29 | 498 | 311 | 516 | 1325 |
30-39 | 303 | 102 | 421 | 826 |
40-49 | 97 | 12 | 178 | 287 |
50-59 | 25 | 1 | 30 | 56 |
60-69 | 1 | 2 | 3 | |
Grand Total | 2128 | 2128 | 2128 | 6384 |
Essentially, this type of pivot table is a frequency table of the variable ”Count”. These are useful for creating histograms and other visual representations of one-variable data, which are discussed in chapter 5.