3.2.2 Worked Examples

The following examples deal with data collected at Over Easy using the observational data forms from the last chapter. The data can be found in the file C03 OverEasy2.xls [.rda].


Example 3.5. Reading a one-variable pivot table

Given the data on counts of people at OverEasy for the period shown in the data, it might be nice to see how many people typically sit in each area of the restaurant (Nook, Cranny and Hole). We could go through the data and count these up ourselves, or sort the data and look at it, but in a few seconds, our software can produce a pivot table doing this for us. Moreover, it is then easy to change the way the data is displayed and cross-sectioned. For example, the table below shows the average number of people in each area of the restaurant. From this, we see that one area of the restaurant, the Hole section, has many fewer people in it, on average.



Average of Count


Location Total


Cranny 18.604


Hole 12.176


Nook 22.069


Grand Total 17.616


However, we do not know how significant these differences are. It would help to know how variable the number of people in each section is. By changing the display to show the standard deviation of the counts of people in each section, we get the table below. Notice that this shows that the overall spread of the data is about 12 people, but that the Hole area is much lower at about 9 people. So not only is the average number of people in that area low, but the spread is much less, indicating that we have consistently fewer people in that area of the restaurant.



StdDev of Count


Location Total


Cranny 12.075


Hole 9.042


Nook 12.549


Grand Total 12.046


Realistically, though, we should only be looking at the total count of people in each area if the areas are equal in size. Assuming they are, the above analysis holds; if they are not equal, and if, for example, the Hole area is smaller than the other two, then we would of course expect lower numbers in that area. In the case where the numbers have different maximum values, we need to find a number that is more stable across all the areas of the restaurant. For example, from the observational data, we could create a new variable called ”PercentCapacity” that is the actual number of people divided by the maximum capacity of that area in the restaurant. These numbers would all range from 0% to 100%, making them comparable. For now, though, we assume that all the areas are the same size, which leads us to conclude that for some reason, people don’t seem to want to use the Hole area of the restaurant.

To find out why this is we would need to collect more data. Perhaps people do not like the decor in that area. Or maybe it’s farther from the door, so that people fill up from the front of the restaurant on back. Maybe there’s no view out of the windows in that area, or it is too close to the kitchen or bathrooms, making it noisy.


Example 3.6. Pivot tables in two variables

Let us first revisit the example pivot table from the discussion above (table 3.1.) It shows the average number of people in each section of the restaurant, broken down by two variables: day of the week and location. This lets us explore more patterns in the data and see more clearly what is happening.

Notice that the table shows how much lower the couts are in the Hole area - on every day of the week. And for several days (W = Wednesday, H = Thursday, and F = Friday) the counts in the Hole are about 50% of the counts in either of the other areas. We can also see in the last column that our busiest days tend to be S = Saturday and D = Sunday. Using this information, we could more easily plan how many servers to position in each area each day of the week. But these numbers will clearly change. How much can we expect them to change? The table below displays the standard deviation of the counts, broken down by day and location. Overall, we see a lot of variation in the counts.






StdDev of Count
Location





Day Cranny Hole Nook Grand Total





D 16.331 12.153 16.138 15.538





F 11.864 6.031 12.474 12.023





H 10.138 5.804 12.245 11.275





M 10.171 7.860 10.179 9.874





S 11.748 9.791 12.050 11.554





T 10.027 7.956 9.434 9.540





W 7.982 6.175 11.603 10.402





Grand Total 12.075 9.042 12.549 12.046





Another way to view the data in a pivot table is to look at the data not as counts (the default, usually useful only if the data variable is categorical) or as averages or standard deviations, but as a percentage. Most commonly, we would represent the data either as a percentage of the row or the column variable. For example, the table below displays the average counts, but as a percentage of the row variable (day of the week) showing us what percentage of our customers are in each of the three areas of the restaurant. Such as view of the data lets us quickly see - on a common scale - which areas are most and least popular.

These percentages might be useful for many things that the raw numbers would not directly show. For example, by looking at who is where through a percentage, we can make reasonable staffing decisions: if 25% of the customers are in the Hole on Thursday, then 25% of our staff should be in that area. In addition, this also puts all the areas on an equal footing.






Sum of Count Location





Day Cranny Hole Nook Grand Total





D 35.95% 25.65% 38.40% 100.00%





F 38.97% 17.32% 43.71% 100.00%





H 35.22% 18.00% 46.77% 100.00%





M 35.17% 25.29% 39.54% 100.00%





S 34.32% 27.14% 38.54% 100.00%





T 35.62% 25.73% 38.65% 100.00%





W 30.31% 19.92% 49.78% 100.00%





Grand Total 35.20% 23.04% 41.76% 100.00%






Example 3.7. Large pivot tables and grouping

Our data on the counts at Over Easy also includes the time of day. It might be interesting to look at the data using this as one of our variables, but there are a lot of times during the day. A quick two-variable pivot table looking at time and location gives us the rather long table shown below. The length is due to the large number of values for the variable ”Time”. If our variable had been a continuous numerical variable instead, we could have an even bigger table, with one row for each different value of the variable.






Average of Count
Location





Time Cranny Hole Nook Grand Total





500 3.91 2.60 5.95 4.15





530 16.69 10.84 20.01 15.85





600 31.03 20.59 36.51 29.38





630 32.83 22.33 36.94 30.70





700 28.43 18.73 32.63 26.60





730 25.19 16.88 29.48 23.85





800 16.17 10.72 19.43 15.44





830 12.51 8.36 15.90 12.26





900 6.67 4.27 8.99 6.64





930 5.42 3.69 7.63 5.58





1000 6.56 4.52 8.85 6.64





1030 8.71 5.67 11.19 8.52





1100 16.36 10.70 19.87 15.64





1130 20.63 13.17 24.20 19.33





1200 29.29 18.77 33.52 27.19





1230 31.54 19.85 36.02 29.13





1300 31.06 19.27 35.37 28.57





1330 26.52 17.85 30.84 25.07





1400 3.96 2.55 6.00 4.17





Grand Total 18.60 12.18 22.07 17.62





Large tables are generally undesirable. They are harder to read and harder to interpret. Usually, though, when you use a numerical variable as either the column or row variable in a pivot table, there is a way to group the values of the variable to make it easier to read the table. In this case, we could easily group all the times between 500 and 1000 into a ”Breakfast” group all the other times into a ”Lunch” group.







Average of Count
Location






Time2 Time Cranny Hole Nook Grand Total






Group1 500 3.91 2.60 5.95 4.15






530 16.69 10.84 20.01 15.85






600 31.03 20.59 36.51 29.38






630 32.83 22.33 36.94 30.70






700 28.43 18.73 32.63 26.60






730 25.19 16.88 29.48 23.85






800 16.17 10.72 19.43 15.44






830 12.51 8.36 15.90 12.26






900 6.67 4.27 8.99 6.64






930 5.42 3.69 7.63 5.58






Group2 1000 6.56 4.52 8.85 6.64






1030 8.71 5.67 11.19 8.52






1100 16.36 10.70 19.87 15.64






1130 20.63 13.17 24.20 19.33






1200 29.29 18.77 33.52 27.19






1230 31.54 19.85 36.02 29.13






1300 31.06 19.27 35.37 28.57






1330 26.52 17.85 30.84 25.07






1400 3.96 2.55 6.00 4.17






Grand Total 18.60 12.18 22.07 17.62






At first glance, this has not really helped. The table is in fact larger by one column. But by hiding the details of a grouped variable, we can get a much smaller table, letting us quickly compare the morning (Group 1) and noon (Group 2) rushes.







Average of Count
Location






Time2 Time Cranny Hole Nook Grand Total






Group1 17.88 11.90 21.35 17.04






Group2 19.40 12.48 22.87 18.25






Grand Total 18.60 12.18 22.07 17.62






As a final way of looking at this data, consider using the count variable itself as a row variable. The counts in each section range from 0 to 65, leaving us with 66 rows in such a table. But by grouping them (which can easily be done with a numerical variable like ”Count”) we can quickly see how many 30 minute blocks of time (observations) in each area of the restaurant fell into each grouping of the number of patrons at our restaurant.






Count of Count
Location





Count Cranny Hole Nook Grand Total





0-9 675 1030 517 2222





10-19 529 672 464 1665





20-29 498 311 516 1325





30-39 303 102 421 826





40-49 97 12 178 287





50-59 25 1 30 56





60-69 1 2 3





Grand Total 2128 2128 2128 6384





Essentially, this type of pivot table is a frequency table of the variable ”Count”. These are useful for creating histograms and other visual representations of one-variable data, which are discussed in chapter 5.