9.2.1 Definitions and Formulas

Dummy variables
These are variables made from a categorical variable. For each category in the variable, one dummy variable must be created. Normally, these are named by adding the category name to the end of the variable name. For a given observation, if the observation is in the category associated with a dummy variable, then the value of the dummy variable is 1 (for ”yes, I’m in this category”). If the observation is not in the category associated with the dummy variable, then the dummy variable is equal to 0 (for ”no, I’m not one of these”). Dummy variables are also called indicator or 0-1 variables.

Dummy variables are called ”dummy” because they are artificial variables that 1) do not occur in the original data and 2) are created solely for the purpose of transforming categorical data into numerical data.

Exact multicollinearity
This is an error that can occur if some of the explanatory variables are exactly related by a linear equation.
Reference category
When creating a regression model, to avoid exact multicollinearity, it is necessary that one of the dummy variables be left out of each group that came from a single categorical variable. The dummy variable left out is the reference category to which all interpretation of the model coefficients must be compared.