Worked Examples

The following examples deal with data collected at Over Easy using the observational data forms from the last chapter. The data can be found in the file C03 OverEasy2.xls [.rda].

Given the data on counts of people at OverEasy for the period shown in the data, it might be nice to see how many people typically sit in each area of the restaurant (Nook, Cranny and Hole). We could go through the data and count these up ourselves, or sort the data and look at it, but in a few seconds, our software can produce a pivot table doing this for us. Moreover, it is then easy to change the way the data is displayed and cross-sectioned. For example, the table below shows the average number of people in each area of the restaurant. From this, we see that one area of the restaurant, the Hole section, has many fewer people in it, on average.


Average of Count

Location	Total

Cranny	18.604

Hole	12.176

Nook	22.069

Grand Total	17.616

However, we do not know how significant these differences are. It would help to know how variable the number of people in each section is. By changing the display to show the standard deviation of the counts of people in each section, we get the table below. Notice that this shows that the overall spread of the data is about 12 people, but that the Hole area is much lower at about 9 people. So not only is the average number of people in that area low, but the spread is much less, indicating that we have consistently fewer people in that area of the restaurant.


StdDev of Count

Location	Total

Cranny	12.075

Hole	9.042

Nook	12.549

Grand Total	12.046

Realistically, though, we should only be looking at the total count of people in each area if the areas are equal in size. Assuming they are, the above analysis holds; if they are not equal, and if, for example, the Hole area is smaller than the other two, then we would of course expect lower numbers in that area. In the case where the numbers have different maximum values, we need to find a number that is more stable across all the areas of the restaurant. For example, from the observational data, we could create a new variable called ”PercentCapacity” that is the actual number of people divided by the maximum capacity of that area in the restaurant. These numbers would all range from 0% to 100%, making them comparable. For now, though, we assume that all the areas are the same size, which leads us to conclude that for some reason, people don’t seem to want to use the Hole area of the restaurant.

To find out why this is we would need to collect more data. Perhaps people do not like the decor in that area. Or maybe it’s farther from the door, so that people fill up from the front of the restaurant on back. Maybe there’s no view out of the windows in that area, or it is too close to the kitchen or bathrooms, making it noisy.

Let us first revisit the example pivot table from the discussion above (table 3.1.) It shows the average number of people in each section of the restaurant, broken down by two variables: day of the week and location. This lets us explore more patterns in the data and see more clearly what is happening.

Notice that the table shows how much lower the couts are in the Hole area - on every day of the week. And for several days (W = Wednesday, H = Thursday, and F = Friday) the counts in the Hole are about 50% of the counts in either of the other areas. We can also see in the last column that our busiest days tend to be S = Saturday and D = Sunday. Using this information, we could more easily plan how many servers to position in each area each day of the week. But these numbers will clearly change. How much can we expect them to change? The table below displays the standard deviation of the counts, broken down by day and location. Overall, we see a lot of variation in the counts.


StdDev of Count	Location

Day	Cranny	Hole	Nook	Grand Total

D	16.331	12.153	16.138	15.538

F	11.864	6.031	12.474	12.023

H	10.138	5.804	12.245	11.275

M	10.171	7.860	10.179	9.874

S	11.748	9.791	12.050	11.554

T	10.027	7.956	9.434	9.540

W	7.982	6.175	11.603	10.402

Grand Total	12.075	9.042	12.549	12.046

Another way to view the data in a pivot table is to look at the data not as counts (the default, usually useful only if the data variable is categorical) or as averages or standard deviations, but as a percentage. Most commonly, we would represent the data either as a percentage of the row or the column variable. For example, the table below displays the average counts, but as a percentage of the row variable (day of the week) showing us what percentage of our customers are in each of the three areas of the restaurant. Such as view of the data lets us quickly see - on a common scale - which areas are most and least popular.

These percentages might be useful for many things that the raw numbers would not directly show. For example, by looking at who is where through a percentage, we can make reasonable staffing decisions: if 25% of the customers are in the Hole on Thursday, then 25% of our staff should be in that area. In addition, this also puts all the areas on an equal footing.


Sum of Count	Location

Day	Cranny	Hole	Nook	Grand Total

D	35.95%	25.65%	38.40%	100.00%

F	38.97%	17.32%	43.71%	100.00%

H	35.22%	18.00%	46.77%	100.00%

M	35.17%	25.29%	39.54%	100.00%

S	34.32%	27.14%	38.54%	100.00%

T	35.62%	25.73%	38.65%	100.00%

W	30.31%	19.92%	49.78%	100.00%

Grand Total	35.20%	23.04%	41.76%	100.00%

Our data on the counts at Over Easy also includes the time of day. It might be interesting to look at the data using this as one of our variables, but there are a lot of times during the day. A quick two-variable pivot table looking at time and location gives us the rather long table shown below. The length is due to the large number of values for the variable ”Time”. If our variable had been a continuous numerical variable instead, we could have an even bigger table, with one row for each different value of the variable.


Average of Count	Location

Time	Cranny	Hole	Nook	Grand Total

500	3.91	2.60	5.95	4.15

530	16.69	10.84	20.01	15.85

600	31.03	20.59	36.51	29.38

630	32.83	22.33	36.94	30.70

700	28.43	18.73	32.63	26.60

730	25.19	16.88	29.48	23.85

800	16.17	10.72	19.43	15.44

830	12.51	8.36	15.90	12.26

900	6.67	4.27	8.99	6.64

930	5.42	3.69	7.63	5.58

1000	6.56	4.52	8.85	6.64

1030	8.71	5.67	11.19	8.52

1100	16.36	10.70	19.87	15.64

1130	20.63	13.17	24.20	19.33

1200	29.29	18.77	33.52	27.19

1230	31.54	19.85	36.02	29.13

1300	31.06	19.27	35.37	28.57

1330	26.52	17.85	30.84	25.07

1400	3.96	2.55	6.00	4.17

Grand Total	18.60	12.18	22.07	17.62

Large tables are generally undesirable. They are harder to read and harder to interpret. Usually, though, when you use a numerical variable as either the column or row variable in a pivot table, there is a way to group the values of the variable to make it easier to read the table. In this case, we could easily group all the times between 500 and 1000 into a ”Breakfast” group all the other times into a ”Lunch” group.


Average of Count		Location

Time2	Time	Cranny	Hole	Nook	Grand Total

Group1	500	3.91	2.60	5.95	4.15

	530	16.69	10.84	20.01	15.85

	600	31.03	20.59	36.51	29.38

	630	32.83	22.33	36.94	30.70

	700	28.43	18.73	32.63	26.60

	730	25.19	16.88	29.48	23.85

	800	16.17	10.72	19.43	15.44

	830	12.51	8.36	15.90	12.26

	900	6.67	4.27	8.99	6.64

	930	5.42	3.69	7.63	5.58

Group2	1000	6.56	4.52	8.85	6.64

	1030	8.71	5.67	11.19	8.52

	1100	16.36	10.70	19.87	15.64

	1130	20.63	13.17	24.20	19.33

	1200	29.29	18.77	33.52	27.19

	1230	31.54	19.85	36.02	29.13

	1300	31.06	19.27	35.37	28.57

	1330	26.52	17.85	30.84	25.07

	1400	3.96	2.55	6.00	4.17

Grand Total		18.60	12.18	22.07	17.62

At first glance, this has not really helped. The table is in fact larger by one column. But by hiding the details of a grouped variable, we can get a much smaller table, letting us quickly compare the morning (Group 1) and noon (Group 2) rushes.


Average of Count		Location

Time2	Time	Cranny	Hole	Nook	Grand Total

Group1		17.88	11.90	21.35	17.04

Group2		19.40	12.48	22.87	18.25

Grand Total		18.60	12.18	22.07	17.62

As a final way of looking at this data, consider using the count variable itself as a row variable. The counts in each section range from 0 to 65, leaving us with 66 rows in such a table. But by grouping them (which can easily be done with a numerical variable like ”Count”) we can quickly see how many 30 minute blocks of time (observations) in each area of the restaurant fell into each grouping of the number of patrons at our restaurant.


Count of Count	Location

Count	Cranny	Hole	Nook	Grand Total

0-9	675	1030	517	2222

10-19	529	672	464	1665

20-29	498	311	516	1325

30-39	303	102	421	826

40-49	97	12	178	287

50-59	25	1	30	56

60-69	1		2	3

Grand Total	2128	2128	2128	6384

Essentially, this type of pivot table is a frequency table of the variable ”Count”. These are useful for creating histograms and other visual representations of one-variable data, which are discussed in chapter 5.

3.2.2 Worked Examples