8.2 Using and Comparing the Usefulness of a Proportional Model

Now, it’s one thing to have an equation to model data. We can always get regression equations for any data. However, the ”best fit line” may not be a very good model for the data. We need a way to know not only the equation of the model, but also how good the model is. We will learn about two ways to measure ”how good a model is”. The first is a direct test for whether the two variables in the model are even linearly related. This is called the coefficient of determination (R2) and is related to the correlation between the two variables. The second measure tells us how close predictions from our model will be to the actual data. This number is called standard error of estimate (Se) and is sort of a standard deviation, indicating how spread out the data is from the model.

These two quantities relate to the entire regression model, reducing some characteristic ”error” in the model down to single numbers. There are other ways to check on the quality of the regression model, however. most statistical packages provide diagnostic graphs for checking the regression model out. Two of the most important of these graphs are the graphs of the predicted values (also called fitted values) versus the actual response variable data and the graph of the residuals (the error in the model) versus the fitted values. A quick look at these two scatterplots can often tell you a lot about the quality of the model. Taken together with the coefficient of determination and the standard error of estimate, these are very powerful tools for determining the quality of the regression models you produce. After all, it is easy to simply point and click to produce more and more regression models; what is difficult is learning which ones are useful and to what extent they are useful.

  8.2.1 Definitions and Formulas
  8.2.2 Worked Examples
  8.2.3 Exploration 8B: How Outliers Influence Regression