The worked examples below should help you decide what type of data you are extracting from a problem situation as well as the units or categories in which it should be recorded.
Example 2.1. Salary Data: Type and Units
Consider organizing data about the salaries of employees at a company. We might be interested in
each employee’s salary as well as his or her position with the company and experience. Our
analysis, and thus our findings, will clearly depend on what data we collect, but just as
importantly, the analysis will depend on how we record, or code, that data. Even with just a few
simple variables in our data, we have many options to consider. In the first table, we record the
data much as you might initially expect.
Variable | Type | Units/Categories | Notes |
Employee | IDentifier | No units | Employee ID Number |
Salary | Numerical continuous | Dollars (e.g. $34856) | Annual Gross Salary |
Dept | Categorical nominal | S = Sales | Department in which |
P = Purchasing | employee works |
||
A = Accounting |
|
||
R = Research |
|
||
YrsExp | Numerical Discrete | Years | Years of working experience |
(not necessarily all with this company). |
|||
There is nothing wrong with this fairly straightforward approach to recording the data. However, the salary data requires a good deal more information than probably needed, and the years experience will vary widely across the company. So one might consider simplifying these, recording the salary in thousands of dollars and treating experience as a categorical variable.
Note that we change how we can analyze the data we have collected pertaining to the years-of-experience above by changing the data type, that is, the way we record the data. Recording this as a number pinpoints the typical age of an employee in finding the mean age because YrsExp is numerical data, whereas we cannot find such a number when the data is coded categorically. On the other hand, the categorical coding offers us a broader picture of the company’s workforce experience by counting the number of employees falling in the junior, middle, and senior categories. Such a summary of the data would be more difficult if the data were recorded in actual years of experience. For maximum flexibility, one might even consider having two variables for years of experience: In one, the experience is recorded in as in the first table, using the actual years; in the second version of the years of experience variable, it is recorded categorically to allow for easier data summaries to be produced. In fact, one could record the actual age and also include a second variable which is computed from the first to be a descriptino of the age.
Variable | Type | Units/Categories | Notes |
Salary | Numerical continuous | Thousands of Dollars | Annual Gross Salary |
(e.g. 34.9) |
|
||
Dept | Categorical nominal | 1 = Sales | Department in which |
2 = Purchasing | employee works |
||
3 = Accounting |
|
||
4 = Research |
|
||
YrsExp | Categorical Ordinal | New: < 3 years | Years of working experience |
Junior: 3 to <10 years | (not necessarily all |
||
Middle: 10 to < 20 years | with this company). |
||
Senior: 20 or more years |
|
||
Example 2.2. Designing an observational data collection form
Consider the following request from Jenny Eggs, regarding her restaurant:
To: | Oracular Consulting |
From: | Jenny Eggs, Owner of Over-Easy Diner |
Date: | Today |
Re: | Seating complaints |
As you may be aware, my restaurant, Over-Easy Diner, has been serving breakfast and lunch to the citizens of this fine town for the last 50 years. Recently I have overheard a number of comments form the servers indicating that the customers are complaining to them about the comfort of the chairs in the dining area. Last week an anonymous editorial appeared in our local paper branding us ”The Worst Seat in Town”. In order to better understand the potential causes of customer discomfort, I would like for you to collect some data for me. I am particularly interested in the following:
Over Easy serves breakfast and lunch. There are three distinct seating areas, the Nook, the Cranny, and the Hole, where diners seat themselves. The manager wants to redesign the cafeteria and would like to collect data on the seating occupancy patterns in the three dining areas every day over a two-week period beginning on Monday, June 9. Our goal is to first design an observational data collection form, including an explanation of the units and categories.
Step 1. Decide what data is to be collected
Variable | Type | Units/Categories | Notes |
Date | Numerical discrete | MM/DD/YYYY | Date observations were recorded |
Day of Week | Categorical | M: Mon, F: Fri |
|
T: Tues, S: Sat |
|
||
W: Wed, N: Sun |
|
||
H: Thurs |
|
||
Time | Numerical continuous | HH:MM AM/PM |
|
Nook | Numerical discrete | Customers | How many customers |
are seated in ”Nook”? |
|||
Cranny | Numerical discrete | Customers | How many customers |
are seated in ”Cranny”? |
|||
Hole | Numerical discrete | Customers | How many customers |
are seated in the ”Hole”? |
|||
Step 2. Design an data collection form for the OBSERVATIONAL data.
A simple data collection form for seating patterns might look like the sheet above, with columns for each of the variables, and rows for each set of observations. In this case, we have an observational form; someone will have to look around the restaurant at particular days and times and record the data. Such observational data, no matter how they are gotten, are essential for understanding what is actually happening in a problem situation.
BLANK DATA COLLECTION FORM FOR OVER EASY
Date | Day | Time | Nook | Cranny | Hole |
(MM/DD) | (MTWHFSN) | (HH:MM AM/PM) | |||
COMPLETED DATA COLLECTION FORM FOR OVER EASY
Date | Day | Time | Nook | Cranny | Hole |
(MM/DD) | (MTWHFSN) | (HH:MM AM/PM) | |||
06/12 | M | 09:30 AM | 23 | 24 | 16 |
06/15 | H | 01:00 PM | 28 | 15 | 34 |
etc. | |||||
Example 2.3. Designing a survey questionnaire form
The memo suggests that the cafeteria manager also wants to collect some customer preference data
before remodeling the cafeteria. We need to design a questionnaire for this purpose. The manager
will offer free juice, coffee, or side orders to induce customers to fill out the forms, one per
customer. Information about the variables and ways of measuring them appears in the table
below.
Variable Name | Type | Units/Categories | Notes |
FirstVisit | Categorical | Y=Yes, N = No | Is this your first visit? |
Room | Categorical | P = Plenty | Is there enough room |
E = Enough | between the tables? | ||
N = Need more space | |||
ChairSize | Numerical discrete | 1 to 4 (1=great, 4=terrible) | Rank the comfort |
of the chairs. | |||
ChairCushion | Numerical discrete | 1 to 4 (1=great, 4=terrible) | Rank the cushioning |
of the chairs. | |||
ChairFit | Numerical discrete | 1 to 4 (1=great, 4=terrible) | Rank the fit to the |
body of the chairs. | |||
Keep | Categorical | Y=Yes (keep) | Should we keep |
N = No (combine) | the separate areas? | ||
A possible survey form might look like the one below. Notice that this data is all opinion data. This is why we need multiple methods of data collection to triangulate the data; this gives us information and helps us corroborrate data from each of the different methods of collection.
Over Easy Customer Satisfaction Survey |
||||||||||||||||||||||||||||||||
Please circle your answers: |
||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
Note: Questions 1, 2, and 4 collect categorical nominal data. Question 3 collects categorical ordinal data