Chapter 10
Is the Model Any Good?1

In the last chapter we built regression models that measured the effects of several explanatory variables on a dependent variable. For example, how educational background, prior experience, years with a company, job level, or gender affect salary. We determined how each explanatory variable, whether numerical or categorical, expressed its effect on salary through its coefficient in the regression equation. The process of building such a model is a statistical one; that is, it involves determining a best-fit equation by calculating how much of the total variation is accounted for by the model. This calculation, in turn, is based on certain probabilistic assumptions concerning how the data is distributed. The first section of this chapter concerns how confident we can be that the coefficients of our explanatory variables are trustworthy. This is critically important if we are to make decisions based on our understanding of what a model seems to be telling us. We need criteria to determine which explanatory variables are truly significant in affecting the dependent variable–and which are not–if our model is to be at all useful. This section helps us to separate the wheat from the chaff.

The second section of this chapter furthers the process of building more complex and accurate models from several explanatory variables by considering how interactions between the variables themselves might have an effect on the dependent variable. That is, some of these variables might express their effects on the dependent variable in combination with other explanatory variables. In fact, there are even cases in which an explanatory variable appears to have a significant effect only when it is combined with one or more other explanatory variables. For example, it may be that employees’ gender by itself has no significant effect on salary, but gender together with job level might have a negative impact on salary. That is, the negative effect of gender on salary only has a significant impact when the employee is a female in a higher-level position: the well-known ”glass-ceiling” effect. This section, then, concerns not only the effects of several individual explanatory variables on a dependent variable, but also the effects of pairs of them on the dependent variable. You will learn in this chapter how to create multiple regression models with interaction variables built from both numerical and categorical explanatory variables and assess their significance. You will learn how to analyze and interpret these often complex models.

As a result of this chapter, students will learn

As a result of this chapter, students will be able to

How to determine the trustworthiness of the coefficients of a regression equation

How to determine which coefficients should be kept in a model and which should not

How to interpret models with complex interaction terms involving both numerical and categorical variables

To determine with 95% confidence the range of values within which regressions coefficients fall

Create interaction terms

Identify the reference categories of interaction variables

Construct interaction variables from existing variables in a data set

Construct a model using interaction terms

How to use stepwise regression to build complex models with significant variables

 10.1 Which coefficients are trustworthy?
  10.1.1 Definitions and Formulas
  10.1.2 Worked Examples
  10.1.3 Exploration 10A: Building a Trustworthy Model at EnPact
 10.2 More Complexity with Interaction Terms
  10.2.1 Definitions and Formulas
  10.2.2 Worked Examples
  10.2.3 Exploration 10B: Complex Gender Interactions at EnPact
 10.3 Homework
  Mechanics and Techniques Problems
  Application and Reasoning Problems
 10.4 Memo Problem: Truck Maintenance Expenses, Part 2