Mechanics and Techniques Problems

8.1. Suppose you know statistics for X and Y shown below. You also know that the correlation of X to Y is 0.56. Use these to determine the equations of the least-squares best fit regression model to predict Y as a function of X. Produce a graph of this regression equation. Show all work.

Statistic X-Variable Y-Variable
Mean 15.27 107.93
Standard Deviation 7.82 38.77
First Quartile 5.3 47.1
Median 15.2 105.4
Third Quartile 22.6 160.3

8.2. The regression output below was developed from data relating the monthly usage of electricity (MonthlyUsage, measured in kilowatt-hours) to the size of homes (HomeSize, measured in square feet). One-variable statistics for each of these variables is also given below.

  1. Use this information to write down the equation of the regression model. Explain what each part of the regression model means, paying particular attention to the unit of the coefficients in the regression equation.
  2. Analyze the quality of the regression model you wrote down, based on the summary statistics in the regression output and the statistics on the X and Y variables.
  3. Based on your regression model, what is the relationship between home size and monthly usage? Does this seem realistic? (Hint: What does the model predict for bigger and bigger homes? What about smaller homes? Are there any homes for which the model predicts a monthly usage of zero?)









Results of simple regression for Monthly Usage

Summary measures

Multiple R 0.9120

R-Square 0.8317

StErr of Est 133.4377

ANOVA Table

Source df SS MS F p-value

Explained 1 703957.1781 703957.1781 39.5357 0.0002

Unexplained 8 142444.9219 17805.6152

Regression coefficients

Lower Upper

Coefficient Std Err t-value p-value limit limit

Constant 578.9277 166.9681 3.4673 0.0085 193.8984 963.9570

HomeSize 0.5403 0.0859 6.2877 0.0002 0.3421 0.7385











Summary measures for selected variables
HomeSize MonthlyUsage
Mean 1880.000 1594.700
Median 1775.000 1641.000
Standard deviation 517.623 306.667
Minimum 1290.000 1172.000
Maximum 2930.000 1956.000
Variance 267933.333 94044.678
First quartile 1502.500 1321.250
Third quartile 2167.500 1831.000
Interquartile range 665.000 509.750
Skewness 0.893 -0.308
Kurtosis 0.340 -1.565



8.3. Pie in the Sky, Inc. runs a chain of pizza eateries (See data file C08 Pizza.xls [.rda].) The manager has collected data from each of the stores in the chain regarding the number of pizzas sold in one month, the average price of the pizzas, the amount the store spent on advertising that month, and the average disposable income of families in the area near the store.

  1. The manager wants to know how these variables are related. Specifically, he wants to know which variable is the best to use for predicting the number of pizzas that a given store will sell in a month. Develop regression models to predict the quantity sold based on each variable in the data. Use the three models you develop to determine which variable is the most influential.
  2. Use your best model to determine how many pizzas will be sold if a store has an average pizza cost of $11.00, spends $51,000 on advertising, and is in a region with an average disposable income of $40,000.
  3. Based on the models that you developed, if a store wanted to sell 80,000 pizzas, what should the store do?