2.1.2 Worked Examples

The worked examples below should help you decide what type of data you are extracting from a problem situation as well as the units or categories in which it should be recorded.


Example 2.1. Salary Data: Type and Units
Consider organizing data about the salaries of employees at a company. We might be interested in each employee’s salary as well as his or her position with the company and experience. Our analysis, and thus our findings, will clearly depend on what data we collect, but just as importantly, the analysis will depend on how we record, or code, that data. Even with just a few simple variables in our data, we have many options to consider. In the first table, we record the data much as you might initially expect.





Variable Type Units/Categories

Notes





Employee IDentifier No units

Employee ID Number





Salary Numerical continuous Dollars (e.g. $34856)

Annual Gross Salary





Dept Categorical nominal S = Sales

Department in which

P = Purchasing

employee works

A = Accounting

R = Research





YrsExp Numerical Discrete Years

Years of working experience

(not necessarily all with this company).





There is nothing wrong with this fairly straightforward approach to recording the data. However, the salary data requires a good deal more information than probably needed, and the years experience will vary widely across the company. So one might consider simplifying these, recording the salary in thousands of dollars and treating experience as a categorical variable.

Note that we change how we can analyze the data we have collected pertaining to the years-of-experience above by changing the data type, that is, the way we record the data. Recording this as a number pinpoints the typical age of an employee in finding the mean age because YrsExp is numerical data, whereas we cannot find such a number when the data is coded categorically. On the other hand, the categorical coding offers us a broader picture of the company’s workforce experience by counting the number of employees falling in the junior, middle, and senior categories. Such a summary of the data would be more difficult if the data were recorded in actual years of experience. For maximum flexibility, one might even consider having two variables for years of experience: In one, the experience is recorded in as in the first table, using the actual years; in the second version of the years of experience variable, it is recorded categorically to allow for easier data summaries to be produced. In fact, one could record the actual age and also include a second variable which is computed from the first to be a descriptino of the age.





Variable Type Units/Categories

Notes





Salary Numerical continuous Thousands of Dollars

Annual Gross Salary

(e.g. 34.9)





Dept Categorical nominal 1 = Sales

Department in which

2 = Purchasing

employee works

3 = Accounting

4 = Research





YrsExp Categorical Ordinal New: < 3 years

Years of working experience

Junior: 3 to <10 years

(not necessarily all

Middle: 10 to < 20 years

with this company).

Senior: 20 or more years






Example 2.2. Designing an observational data collection form

Consider the following request from Jenny Eggs, regarding her restaurant:

To: Oracular Consulting
From: Jenny Eggs, Owner of Over-Easy Diner
Date: Today
Re: Seating complaints

As you may be aware, my restaurant, Over-Easy Diner, has been serving breakfast and lunch to the citizens of this fine town for the last 50 years. Recently I have overheard a number of comments form the servers indicating that the customers are complaining to them about the comfort of the chairs in the dining area. Last week an anonymous editorial appeared in our local paper branding us ”The Worst Seat in Town”. In order to better understand the potential causes of customer discomfort, I would like for you to collect some data for me. I am particularly interested in the following:

Over Easy serves breakfast and lunch. There are three distinct seating areas, the Nook, the Cranny, and the Hole, where diners seat themselves. The manager wants to redesign the cafeteria and would like to collect data on the seating occupancy patterns in the three dining areas every day over a two-week period beginning on Monday, June 9. Our goal is to first design an observational data collection form, including an explanation of the units and categories.

Step 1. Decide what data is to be collected





Variable Type Units/Categories

Notes





Date Numerical discrete MM/DD/YYYY

Date observations were recorded





Day of Week Categorical M: Mon, F: Fri

T: Tues, S: Sat

W: Wed, N: Sun

H: Thurs





Time Numerical continuous HH:MM AM/PM





Nook Numerical discrete Customers

How many customers

are seated in ”Nook”?





Cranny Numerical discrete Customers

How many customers

are seated in ”Cranny”?





Hole Numerical discrete Customers

How many customers

are seated in the ”Hole”?





Step 2. Design an data collection form for the OBSERVATIONAL data.

A simple data collection form for seating patterns might look like the sheet above, with columns for each of the variables, and rows for each set of observations. In this case, we have an observational form; someone will have to look around the restaurant at particular days and times and record the data. Such observational data, no matter how they are gotten, are essential for understanding what is actually happening in a problem situation.

BLANK DATA COLLECTION FORM FOR OVER EASY







Date Day Time Nook Cranny Hole
(MM/DD) (MTWHFSN) (HH:MM AM/PM)
























COMPLETED DATA COLLECTION FORM FOR OVER EASY







Date Day Time Nook Cranny Hole
(MM/DD) (MTWHFSN) (HH:MM AM/PM)






06/12 M 09:30 AM 23 24 16






06/15 H 01:00 PM 28 15 34






etc.







Example 2.3. Designing a survey questionnaire form
The memo suggests that the cafeteria manager also wants to collect some customer preference data before remodeling the cafeteria. We need to design a questionnaire for this purpose. The manager will offer free juice, coffee, or side orders to induce customers to fill out the forms, one per customer. Information about the variables and ways of measuring them appears in the table below.





Variable Name Type Units/Categories Notes




FirstVisit Categorical Y=Yes, N = No Is this your first visit?




Room Categorical P = Plenty Is there enough room
E = Enough between the tables?
N = Need more space




ChairSize Numerical discrete 1 to 4 (1=great, 4=terrible) Rank the comfort
of the chairs.




ChairCushion Numerical discrete 1 to 4 (1=great, 4=terrible) Rank the cushioning
of the chairs.




ChairFit Numerical discrete 1 to 4 (1=great, 4=terrible) Rank the fit to the
body of the chairs.




Keep Categorical Y=Yes (keep) Should we keep
N = No (combine) the separate areas?




A possible survey form might look like the one below. Notice that this data is all opinion data. This is why we need multiple methods of data collection to triangulate the data; this gives us information and helps us corroborrate data from each of the different methods of collection.


Over Easy Customer Satisfaction Survey


Please circle your answers:

  1. Is this your first visit to Over Easy?
    YesNo
  2. Is there enough room between the tables?
    PlentyAdequateNeed more space
  3. Please rank the comfort of the chairs on a scale of 1 to 4 (1 is ”great;” 4 is ”terrible”)
    1. Size:

      1

      2

      3

      4

      Great

      Terrible

    2. Cushioning:

      1

      2

      3

      4

      Great

      Terrible

    3. Fit to Body:

      1

      2

      3

      4

      Great

      Terrible

  4. Should we keep the Nook, Cranny, and Hole areas, or should we make one large area?
    Yes, keep themNo, make one large areaDoesn’t matter
  5. Any additional comments about your experience at Over Easy?


Note: Questions 1, 2, and 4 collect categorical nominal data. Question 3 collects categorical ordinal data