So far, we have encountered the idea that variables may be related to each other. Very often, we can use these relationships to determine how one variable that is easier to control affects another variable that we are interested in. To develop such relationships, we can plot the data and try to find an equation that relates the two variables. What we need, though, is a systematic way to decide what the best equation is to fit the data. We will start by using the simplest equations, linear models, to represent the data. The equations for the models will be developed using least squares regression analysis. This is a technique in which a line is assumed to exist that fits the data. By manipulating the slope and y-intercept of this line, it can be made to fit the data better. The ”best fit” occurs when a certain quantity, the total squared error, is made as small as possible. You have already explored this in chapter 7 with the idea of trendlines. Your software probably calculates all its trendlines using least squares regression.
Not all data looks like a straight line when it is graphed. Since we can always find a ”best fit” line for the data, we need some way of determining whether the linear regression equation is a good choice. To decide this, we will make use of several statistical measures and some diagnostic graphs. These will help us answer two important questions: Is the data close enough to linear to make a linear regression equation worth using? If we use the regression equation to predict information, what kind of error can we expect to have in our estimates?
As a result of this chapter, students will learn | As a result of this chapter, students will be able to |
|
|