2.1. Open the spreadsheet C02 Homes.xls [.rda]. This file contains data on over 270 homes that sold in the greater Rochester, NY areas during a three-month period in the year 2000. Identify each variable in the data. Classify each variable as either numerical or categorical. For numerical variables, give a rough idea of the range of the variable. For categorical variables, list each of the categories and how they are coded.
Variable Name | Type | Range/Units/Categories | Notes |
|
|
|
|
|
|
| |
|
|
|
|
|
|
| |
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2.2. Problem situation: Demand for analysts at Delphinium Consulting, Inc. is growing. Delphinium often loses its best consultants to its competitors in the industry, although consultants who stay with Delphinium for at least three years tend to stay with the company much longer.
Problem: The CEO of Delphinium is concerned about the retention of her analysts and has identified data she would like to collect below. Your job is to specify reasonable units or codes for each of these variables.
Variable | Description | Units/Codes |
StartingSalary | Salary upon hiring at Delphinium |
|
OutOfOffice | Percentage of time consultant spends out of the office working with clients |
|
LocalGrad | Whether or not the employee graduated from a local university/college or not |
|
Major | Undergraduate major |
|
Tenure | Time employee has spent with the company |
|
2.3. In problem 2, change the numerical variables StartingSalary, OutOfOffice, and Tenure into categorical variables. For example, to change a numerical variable like TaxPercentage into a categorical variable we might define three categories:
Low | less than 10% |
Middle | between 10% and 20% inclusive |
High | greater than 20% |
2.4. Create a spreadsheet using the variables you defined in problems 2 and 4 above. Create test (fake) data for 5 observations that demonstrate the range of values for each of your variables.