Statistic
While this is older news, the election 2016 was an interesting one because all of the election polls favored Hilary Clinton. This was obviously not the case and Trump was declared the winner by some very narrow margins.
Read at least two of the articles and write about why the election polls went wrong and what (if anything) could be done to have more accurate election polls. Do you think you think the election would have turned out the same if they knew that Trump was in the lead? Will this information change how you view polls and statistics you hear and read about in the news? Make sure to cite which sources you use
https://www.telegraph.co.uk/news/2016/11/09/how-wrong-were-the-polls-in-predicting-the-us-election/
https://www.chronicle.com/article/academic-pollsters-didnt-see-all-those-trump-voters-coming-either-why-not/
https://www.wpr.org/polls-missed-mark-2016-experts-say-things-are-different-2020
https://www.usatoday.com/story/news/politics/elections/2016/2016/11/09/pollsters-donald-trump-hillary-clinton-2016-presidential-election/93523012/
https://www.huffpost.com/entry/pollsters-and-forecasters-had-a-rough-night_n_5822c343e4b0e80b02cdee13
https://www.pewresearch.org/fact-tank/2016/11/09/why-2016-election-polls-missed-their-mark/
PART 2
Also provided is a video on how to do linear regression in excel. It is IMPERATIVE that you have Microsoft excel for this assignment and the remainder of this course as we ill be using it exclusively to solve the problems
Linear Regression in Excel
6. (13.5) Zagat publishes restaurant ratings for various locations in the United States. The file RESTAURANT contains the Zagat ratings for food, decor, service, and the cost per person for a sample of 100 restaurants located in New York city and in a suburb of New York City. Develop a regression Model to predict the cost per person, based on a variable that represents the sum of the ratings for food, decor, and service
a. construct a scatter plot in excel
b. Assume a linear relationship, find bo and b1
c. Interpret the meaning of the y-intercept bo and the slope b1 in this problem
d. Predict the mean cost per person for a restaurant with a summated rating of 50
e. What should you tell the owner of a group of restaurants in this geographical area about the relationship between summated rating and the cost of a meal
>Sheet2
House Price
400
12
00
Statistics
0
s
10
1700
18934.934775692
8
9
Standard Error
-35.5771118647 232.0737711075
0.0103940164
0.0337400654 0.1857954103
Square feet Line Fit Plot
House Price 1400 1600 1700 1875 1100 1550 2350 2450 1425 1700 245 312 279 308 199 219 405 324 319 255 Predicted House Price 1400 1600 1700 1875 1100 1550 2350 2450 1425 1700 251.92316258351892 273.87671014953867 284.8534839325485 304.06283805281578 218.99284123448933 268.38832325803372 356.20251352211261 367.1792873051225 254.66735602927139 284.8534839325485 Square feet
House Price
Sheet3
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Chapter 13
Regression Analysis:
PART 1: Simple Linear Regression
Basic Business Statistics
10th Edition
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Learning Objectives
In this chapter, you learn:
How to use regression analysis to predict the value of a dependent variable (Y) based on an independent variable (X): X causes Y
How to evaluate the assumptions of regression analysis and know what to do if the assumptions are violated
To make inferences about the slope in a linear regression (linear relation b/w X and Y)
To estimate mean values and predict individual values
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Analysis when Two Variables are Related
A scatter diagram can be used to show the relationship between two variables
Scatter diagrams were first presented in Ch. 2
Correlation analysis is used to measure strength of the association (linear relationship) between two variables (Ch. 3)
Correlation is only concerned with strength of the relationship
No causal effect is implied with correlation
Regression analysis is used to show causation
Changes in X cause changes in Y
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Correlation Coefficient (r)
–1 < r < 1
The closer to –1, the stronger the negative linear relationship
The closer to 1, the stronger the positive linear relationship
The closer to 0, the weaker the linear relationship
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 3-*
Scatter Plots of Data with Various Correlation Coefficients
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1
r = -.6
r = 0
r = +.3
r = +1
Y
X
r = 0
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Introduction to
Regression Analysis
Regression analysis is used to:
Explain the impact of changes in an independent variable on changes in the dependent variable
Predict the value of a dependent variable based on the value of one or more independent variables
Dependent variable (Y): the variable we wish to predict or explain
Independent variable (X): the variable used to explain the dependent variable
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Simple Linear Regression Example
A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet)
Dependent variable (Y) = house price
Independent variable (X) = square feet
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Sample Data for House Price Model
House Price
(Y) Square Feet
(X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Types of Relationships
Y
X
Y
X
Y
Y
X
X
Linear relationships
Non-linear relationships
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Simple Linear Regression Model
Only one independent variable, X
Relationship between X and Y is described by a linear function:
Y = intercept + slope(X)
Y = a + bX
Changes in Y are assumed to be caused by changes in X
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Linear Relationships
Y
X
Y
X
Y
Y
X
X
Strong relationships
Weak relationships
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Linear Relationships
Y
X
Y
X
No relationship
(continued)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Linear component
Simple Linear Regression Model
Population
Y intercept
Population Slope
Coefficient
Random Error term
Dependent Variable
Independent Variable
Random Error
component
Population Parameters
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
The simple linear regression equation provides an estimate of the population regression line
Sample Statistics:
Regression Equation
Estimate of the regression
intercept
Estimate of the regression slope
Estimated (or predicted) Y value for observation i
Value of X for observation i
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Formulas: Slope and Intercept
Slope:
b1 =
Intercept:
b0 =
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
INTERCEPT: b0 is the estimated average value of Y when the value of X is zero
SLOPE: b1 is the estimated change in the average value of Y as a result of a one-unit change in X
Interpretation of the
Slope and the Intercept
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Simple Linear Regression Example
A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet)
A random sample of 10 houses is selected
Dependent variable (Y) = house price in $1000s
Independent variable (X) = square feet
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Sample Data for House Price Model
House Price in $1000s
(Y) Square Feet
(X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Graphical Presentation
House price model: scatter plot
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chart2
1400
1600
1700
1875
1100
1550
2350
2450
1425
1700
House Price
Square Feet
House Price ($1000s)
245
312
279
308
199
219
405
324
319
255
Sheet4
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.08476 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
RESIDUAL OUTPUT
Observation Predicted House Price Residuals
1 251.9231625835 -6.9231625835
2 273.8767101495 38.1232898505
3 284.8534839325 -5.8534839325
4 304.0628380528 3.9371619472
5 218.9928412345 -19.9928412345
6 268.388323258 -49.388323258
7 356.2025135221 48.7974864779
8 367.1792873051 -43.1792873051
9 254.6673560293 64.3326439707
10 284.8534839325 -29.8534839325
Sheet4
1400 1400
1600 1600
1700 1700
1875 1875
1100 1100
1550 1550
2350 2350
2450 2450
1425 1425
1700 1700
House Price
Predicted House Price
Square Feet
House Price
Square Feet Line Fit Plot
245
0
312
0
279
0
308
0
199
0
219
0
405
0
324
0
319
0
255
0
Sheet1
House Price Square Feet
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Sheet1
0
0
0
0
0
0
0
0
0
0
House Price
Square Feet
House Price
0
0
0
0
0
0
0
0
0
0
Sheet2
Sheet3
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Regression Using Excel
Data / Data Analysis / Regression
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Excel Output
The regression equation is:
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Graphical Presentation
House price model: scatter plot and regression line
Slope
= 0.10977
Intercept
= 98.248
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chart2
1400
1600
1700
1875
1100
1550
2350
2450
1425
1700
House Price
Square Feet
House Price ($1000s)
245
312
279
308
199
219
405
324
319
255
Sheet4
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.08476 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
RESIDUAL OUTPUT
Observation Predicted House Price Residuals
1 251.9231625835 -6.9231625835
2 273.8767101495 38.1232898505
3 284.8534839325 -5.8534839325
4 304.0628380528 3.9371619472
5 218.9928412345 -19.9928412345
6 268.388323258 -49.388323258
7 356.2025135221 48.7974864779
8 367.1792873051 -43.1792873051
9 254.6673560293 64.3326439707
10 284.8534839325 -29.8534839325
Sheet4
1400 1400
1600 1600
1700 1700
1875 1875
1100 1100
1550 1550
2350 2350
2450 2450
1425 1425
1700 1700
House Price
Predicted House Price
Square Feet
House Price
Square Feet Line Fit Plot
245
0
312
0
279
0
308
0
199
0
219
0
405
0
324
0
319
0
255
0
Sheet1
House Price Square Feet
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Sheet1
0
0
0
0
0
0
0
0
0
0
House Price
Square Feet
House Price
0
0
0
0
0
0
0
0
0
0
Sheet2
Sheet3
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Interpretation of the
Intercept, b0
b0 is the estimated average value of Y when the value of X is zero (if X = 0 is in the range of observed X values)
Here, no houses had 0 square feet, so b0 = 98.24833 just indicates that, for houses within the range of sizes observed, $98,248.33 is the portion of the house price not explained by square feet
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Interpretation of the
Slope Coefficient, b1
b1 measures the estimated change in the average value of Y as a result of a one-unit change in X
Here, b1 = .10977 tells us that the average value of a house increases by .10977($1000) = $109.77, on average, for each additional one square foot of size
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Predict the price for a house with 2000 square feet:
The predicted price for a house with 2000 square feet is 317.85($1,000s) = $317,850
Predictions using
Regression Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable
The coefficient of determination is also called R-squared and is denoted as r2 (also, R2)
Coefficient of Determination (r2)
note:
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
r2 = 1
Examples of Approximate
r2 Values
Y
X
Y
X
r2 = 1
r2 = 1
Perfect linear relationship between X and Y:
100% of the variation in Y is explained by variation in X
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Examples of Approximate
r2 Values
Y
X
Y
X
0 < r2 < 1
Weaker linear relationships between X and Y:
Some but not all of the variation in Y is explained by variation in X
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Examples of Approximate
r2 Values
r2 = 0
No linear relationship between X and Y:
The value of Y does not depend on X. (None of the variation in Y is explained by variation in X)
Y
X
r2 = 0
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
R-squared from Excel Output
58.08% of the variation in house prices is explained by variation in square feet
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Assumptions of Regression
Use the acronym LINE:
Linearity
The underlying relationship between X and Y is linear
Independence of Errors
Error values are statistically independent
Normality of Error
Error values (ε) are normally distributed for any given value of X
Equal Variance (Homoscedasticity)
The probability distribution of the errors has constant variance
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Hypothesis testing about the Slope: t Test
t test for a population slope
Is there a linear relationship between X and Y?
Null and alternative hypotheses
H0: β1 = 0 (no linear relationship)
H1: β1 0 (linear relationship does exist)
Test statistic
where:
b1 = regression slope
coefficient
β1 = hypothesized slope
Sb = standard
error of the slope
1
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Standard Error
of the Slope
The standard error of the regression slope coefficient (b1) is estimated by
where:
= Estimate of the standard error of the least squares slope
= Standard error of the estimate
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Standard Error from
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Simple Linear Regression Equation:
The slope of this model is 0.1098
Does square footage of the house affect its sales price at 95% CL?
Hypothesis testing about the Slope: t Test
House Price
(y) Square Feet
(x)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Example
H0: β1 = 0
H1: β1 0
From Excel output:
t
b1
(continued)
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Critical Value for t statistic
Use “t” table on page 814-815
Depends on:
degrees of freedom: df = n – 2
Significance level: 𝛼
95% CL has 𝛼 = 0.05
90% CL has 𝛼 = 0.10
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 8-*
t Table (p. 814-815)
Upper Tail Area
df
.25
…
.025
1
1.000
…
12.7062
…
…
…
8
0.7064
…
2.3060
t
0
2.3060
The body of the table contains t values, not probabilities
Let: n = 10
df = n - 2 = 8
95% CL
= 0.05
/2 = 0.025
/2 = 0.025
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Result
H0: β1 = 0
H1: β1 0
Test Statistic: t = 3.329
There is sufficient evidence that square footage affects house price
From Excel output:
Reject H0
t
b1
Decision:
Conclusion:
Reject H0
Reject H0
a/2=.025
-tα/2
Do not reject H0
0
tα/2
a/2=.025
-2.3060
2.3060
3.329
d.f. = 10-2 = 8
(continued)
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Confidence Interval Estimate
for the Slope
Confidence Interval Estimate of the Slope:
Excel Printout for House Prices:
At 95% level of confidence, the confidence interval for the slope is (0.0337, 0.1858)
d.f. = n - 2
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Confidence Interval of Slope
Since the confidence interval from above does not contain the 0, we can reject the null
CONCLUSION: We are 95% confident that there is a positive linear relationship between the size of a house and its price.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Pitfalls of Regression Analysis
Lacking an awareness of the assumptions underlying regression methodology
Not knowing how to evaluate the assumptions
Not knowing the alternatives to regression if a particular assumption is violated
Using a regression model without knowledge of the subject matter
Extrapolating outside the relevant range
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Strategies for Avoiding
the Pitfalls of Regression
Start with a scatter diagram of X vs. Y to observe possible relationship
Perform residual analysis to check the assumptions
Plot the residuals vs. X to check for violations of assumptions such as homoscedasticity
Use a histogram, stem-and-leaf display, box-and-whisker plot, or normal probability plot of the residuals to uncover possible non-normality
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Strategies for Avoiding
the Pitfalls of Regression
If there is violation of any assumption, use alternative methods or models
If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals
Avoid making predictions or forecasts outside the relevant range
(continued)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
i
i
1
0
i
ε
X
β
β
Y
+
+
=
i
1
0
i
X
b
b
Y
ˆ
+
=
_______
__________
)
)(
(
å
-
-
Y
Y
X
X
i
i
å
-
2
)
(
X
X
i
X
b
Y
1
-
0
50
100
150
200
250
300
350
400
450
050010001500200025003000
Square Feet
House Price ($1000s)
feet)
(square
0.10977
98.24833
price
house
+
=
0
50
100
150
200
250
300
350
400
450
050010001500200025003000
Square Feet
House Price ($1000s)
feet)
(square
0.10977
98.24833
price
house
+
=
feet)
(square
0.10977
98.24833
price
house
+
=
317.85
0)
0.1098(200
98.25
(sq.ft.)
0.1098
98.25
price
house
=
+
=
+
=
1
R
0
2
£
£
squares
of
sum
total
squares
of
sum
regression
SST
SSR
r
2
=
=
0.58082
32600.5000
18934.9348
SST
SSR
r
2
=
=
=
1
b
1
1
S
β
b
t
-
=
2
n
d.f.
-
=
å
-
=
=
2
i
YX
YX
b
)
X
(X
S
SSX
S
S
1
1
b
S
2
n
SSE
S
YX
-
=
0.03297
S
1
b
=
(sq.ft.)
0.1098
98.25
price
house
+
=
1
b
S
32938
.
3
03297
.
0
0
10977
.
0
S
β
b
t
1
b
1
1
=
-
=
-
=
1
b
2
n
1
S
t
b
-
±
Regression Analysis
using Excel 2007
MTH 305 Statistics
Data Needed in Regression Analysis
At least two variables that have information about several observations
Only one variable will be defined as the Y variable. There can be one or more X variables in regression analysis.
Observation ID Variable 1 Variable 2
1
2
3
Data Example
For example, we are interested in analyzing the linear relationship between amount of sugar and calories in a box of cereals. We are testing whether sugar amount causes calories amount.
In Excel the dataset will look like…see next slide
Data Example
Ways to Check Linear Relationship
Scatter Plot between Y and X
Correlation Value
Regression Analysis
SCATTER PLOT
Scatter-plot of Two variables
Select the data of two variables you wish to analyze.
Under Insert tab Chart and select “scatter plot”
Example from data above:
Looks like there is no linear relationshiip!!!
CORRELATION COEFFICIENT
Correlation Value in Excel
In any Excel cell, type:
=CORREL(range of Y data, range of X data)
For example, for the dataset above (cereal data) where Y data are in cells B2 through B19 and X data are in cells C2 through C19, we will type:
=CORREL(B2:B19, C2:C19)
The result of 0.2296 shows that there is a weak relationship between those variables.
REGRESSION ANALYSIS
Regression Analysis
Under Data Data Analysis Regression
Excel Output: Intercept and Slope
The regression equation is:
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Excel Output: R-squared
58.08% of the variation in house prices is explained by variation in square feet
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.
Chap 13-*
Excel Output: Standard Error
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
To Plot Regression Line
Under the Regression window, place a mark
“Line Fit Plots”
Regression Line
House price model: scatter plot and regression line
Slope
= 0.10977
Intercept
= 98.248
Chart2
1400
1600
1700
1875
1100
1550
2350
2450
1425
1700
House Price
Square Feet
House Price ($1000s)
245
312
279
308
199
219
405
324
319
255
Sheet4
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.08476 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
RESIDUAL OUTPUT
Observation Predicted House Price Residuals
1 251.9231625835 -6.9231625835
2 273.8767101495 38.1232898505
3 284.8534839325 -5.8534839325
4 304.0628380528 3.9371619472
5 218.9928412345 -19.9928412345
6 268.388323258 -49.388323258
7 356.2025135221 48.7974864779
8 367.1792873051 -43.1792873051
9 254.6673560293 64.3326439707
10 284.8534839325 -29.8534839325
Sheet4
1400 1400
1600 1600
1700 1700
1875 1875
1100 1100
1550 1550
2350 2350
2450 2450
1425 1425
1700 1700
House Price
Predicted House Price
Square Feet
House Price
Square Feet Line Fit Plot
245
0
312
0
279
0
308
0
199
0
219
0
405
0
324
0
319
0
255
0
Sheet1
House Price Square Feet
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Sheet1
0
0
0
0
0
0
0
0
0
0
House Price
Square Feet
House Price
0
0
0
0
0
0
0
0
0
0
Sheet2
Sheet3
MULTIVARIATE REGRESSION
Multivariate Regression
Multivariate = two or more X variables than influence Y
Scatter-Plot: get them separately for each pair of X and Y.
Correlation Coefficient: compute them separately for each pair of X and Y.
Regression Analysis: If we want to analyze how two or more X variables have an impact on Y, then we will do the same as above for the case of one X but select the data in all the X variables at the same time.
feet)
(square
0.10977
98.24833
price
house
+
=
0.58082
32600.5000
18934.9348
SST
SSR
r
2
=
=
=
0.03297
S
1
b
=
0
50
100
150
200
250
300
350
400
450
050010001500200025003000
Square Feet
House Price ($1000s)
feet)
(square
0.10977
98.24833
price
house
+
=
ProductCaloriesSugar (grams)
Kellogg's20018
Sam's Choice Extra raisin (Wal-Mart)21023
Kountry Fresh (Winn-Dixie)17017
Post Premium19020
American Fare (kmart)17017
America's Choice (A&P)20018
Safeway20018
Kroger20018
General Mills Total18019
Post The Original Shredded Wheat 'N Bran2001
Post The Original Shredded Wheat, Spoon Size1700
Kellogg's Raisin Squares Mini-Wheats18012
Healthy Choice Toasted Brown Sugar Squares1909
Kountry Fresh Frosted Bite Size (Winn-Dixie)20011
Post Frosted Bite Size19012
Kroger Frosted Bite Size19011
Kellogg's Frosted Mini-Wheats20012
Safeway Frosted Bite Size19011
DATA
Location | Food | Décor | Service | Summated Rating | Coded Location | Cost | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
City | 2 | 1 | 19 | 2 | 0 | 60 | 62 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
24 | 20 | 68 | 67 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
22 | 14 | 50 | 23 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
27 | 74 | 79 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
13 | 52 | 32 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
11 | 18 | 48 | 38 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
21 | 64 | 46 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
17 | 55 | 43 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
16 | 56 | 39 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
15 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
26 | 65 | 44 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
29 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
66 | 59 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
57 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
53 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
25 | 69 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
12 | 51 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
49 | 40 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
61 | 45 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
58 | 33 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
28 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
35 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
54 | 42 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
41 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
63 | 34 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
73 | 78 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Suburban | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
37 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
30 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
36 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
70 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
75 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
31 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||