These are variables made from a categorical variable. For each category
in the variable, one dummy variable must be created. Normally, these are named by
adding the category name to the end of the variable name. For a given observation, if
the observation is in the category associated with a dummy variable, then the value of
the dummy variable is 1 (for ”yes, I’m in this category”). If the observation is not in
the category associated with the dummy variable, then the dummy variable is equal
to 0 (for ”no, I’m not one of these”). Dummy variables are also called indicator or 0-1
variables.
Dummy variables are called ”dummy” because they are artificial variables that 1) do
not occur in the original data and 2) are created solely for the purpose of transforming
categorical data into numerical data.
Exact multicollinearity
This is an error that can occur if some of the explanatory
variables are exactly related by a linear equation.
Reference category
When creating a regression model, to avoid exact multicollinearity, it
is necessary that one of the dummy variables be left out of each group that came from
a single categorical variable. The dummy variable left out is the reference category to
which all interpretation of the model coefficients must be compared.