We are becoming aware that gender may have a significant impact on employees’ salaries at EnPact. But is its impact isolated from that of the other variables that affect salary? Is it possible that the variable GenderFemale, for example, is somehow implicated in the impact that some other variable, say YrsExp, has on salary? If so, then a portion of the magnitude of the coefficient of YrsExp (the measurable effect of experience on salary) should actually be attributed to gender. Or, to put it another way, some of the effect of gender on salary is lost to experience. This means that our regression model is not measuring the true effect that gender has on salary. In addition, our understanding of the nature of any alleged discrimination at EnPact would be greatly increased if we could not only measure the effect that gender by itself makes on salary, but also measure the effect that the interplay or interaction between gender and years of experience makes on employees’ salaries. Similarly, it would also be informative to learn, for example, that gender does not play a role in how some other variable, say education, affects salary.
These kinds of combined effects can be captured in regression models by forming new variables called interaction variables (or terms), which are created by taking the product of two variables that we believe have a combined effect on the dependent variable. The first entry in a column of data for an interaction variable X1 ×X2 is the product of the first entry of X1 with the first entry of X2. The second entry of X1 × X2 is the product of the second entry of X1 with the second entry of X2, etc. When the interaction variables and the original variables are submitted to a regression routine, its computational procedure makes no distinction between variables that are interaction variables and those which are not. When the regression coefficients are computed for any set of variables, the software treats all columns of data with names at their heads the same, whether those names are GenderFemale, YrsExper, or GenderFemale*YrsExp. Most packges have a convenient routine for creating interaction terms.
The following is an example of a regression model containing interaction variables:
Things to know about interaction terms when building models: