Regression and Correlation Analysis
3 page paper.
Must follow the instructions on the attached word document!!
Due Wednesday
>Project Data SALESCALL
)
)
3 2 4 2 ONLINE 2 GROUP 1 GROUP 0 GROUP 131 5 GROUP 18.5 1 GROUP 2 GROUP 16.9 4 ONLINE 136 3 ONLINE 1 GROUP 1 GROUP 2 GROUP 3 NONE 3 NONE 3 ONLINE 3 NONE 2 NONE 3 GROUP 146 1 GROUP 2 NONE 0 ONLINE 146 1 NONE 3 ONLINE 18.5 3 GROUP 3 GROUP 2 GROUP 4 ONLINE 1 GROUP 19.4 0 GROUP 3 GROUP 2 GROUP 1 NONE 1 GROUP 2 GROUP 1 GROUP 2 ONLINE 13.2 3 GROUP 0 ONLINE 16.4 3 GROUP 4 NONE 3 ONLINE 2 GROUP 2 ONLINE 4 ONLINE 13.9 1 GROUP 3 ONLINE 4 GROUP 15.8 2 ONLINE 13.2 3 GROUP 15.7 3 NONE 1 GROUP 17.2 3 ONLINE 17.4 3 ONLINE 0 ONLINE 2 NONE 5 NONE 14.5 4 GROUP 0 ONLINE 1 ONLINE 5 ONLINE 5 ONLINE 14.8 3 GROUP 1 ONLINE 19.4 2 ONLINE 2 ONLINE 3 GROUP 1 ONLINE 18.3 0 ONLINE 18.3 2 GROUP 3 ONLINE 2 ONLINE 15.3 3 ONLINE 12.6 4 ONLINE 3 ONLINE 15.1 4 NONE 12.4 2 ONLINE 4 GROUP 2 ONLINE 3 ONLINE 15.2 2 ONLINE 3 GROUP 2 ONLINE 1 GROUP 13.2 3 ONLINE 2 GROUP 15.2 2 ONLINE 13.0 1 ONLINE Course Project Final Part: Regression & Correlation Analysis 1 Regression & Correlation Analysis 1 1. Scatter Plot of Y and X1 Scatter plott of sales and calls shows that there can be a linear trend between the both. The trendline indicates that it looks like higher the number of calls, higher will be sales 2. Best fit line Using the Regression option in Excel Data analysis menu, obtain the following output best fit line equation is Sales=Intercept + Coefficient of Calls *Calls 3. Coefficient of Correlation It denotes the strength of association between two variables. The sign denotes the direction of association. calculate Correlation coefficient as Correl (X1array,Yarray) We get the value as 0.318
This means that calls and sales are slightly positively associated. With increase in one quantity, the other is also showing an increasing trend. 4. Coefficient of Determination It is more commonly known as R squared value. It gives the measure of how close the data points are to the best fit line. In other words, it gives the proportion of variability in dependent variable that can be explained by the independent variable. Higher the R- squared value, better the model is. Excel regression output, get R squared value or Coefficient of Determination as 0.101 ~10% of variability in sales is explained by calls. 5. Utility of Regression model F test can be used to test the utility of the model. Null Hypothesis: Beta coefficient of call = 0; i.e., Calls is NOT linearly associated with sales Alternate Hypothesis: Beta coefficient of call 0; Calls is linearly associated with sales choose significance level, = 0.05. From the regression ANOVA output, get p value of F test as 0.0012 (<0.05) for the given degrees of freedom Since p value < , reject Null hypothesis, concluding that with the given data it can be said that calls are linearly associated with sales.
6. Based on the above findings, it can be said that calls are a good and important variable in predicting sales volume. It has been proved that calls and sales have a positive linear association between them. From the best fit line (Sales = 22.52 + 0.1237 * Calls), we can say that with every call, sales increases by 0.1237units (interpretation of coefficient of calls). 7. 95% Confidence Interval The 95% confidence interval for the coefficient of Calls (1) is [0.0498, 0.1976] Interpretation: 95% confidence interval means that if this regression analysis is to be repeated for other samples from population, 95% of the intervals will contain the true value of 1. In simpler terms, can say that 95% confident that the true value of 1 is interval. 8. Sales = 22.52 + 0.1237 * Calls say calls = 100. 95% confidence interval for 1 = [0.0498, 0.1976] lower limit of Sales value, Y low= 22.52 + 0.0498 * 100 = 27.5 Upper limit of Sales value, Y high = 22.52 + 0.1976 * 100 = 42.28 for calls = 100, Sales can be expected to be in the range of [27.5, 42.28]
Final Project: Regression and Correlation Analysis
Use the dependent variable (labeled Y) and one of the independent variables (labeled X1, X2, and X3) in the data file. Select and use one independent variable throughout this analysis. Use Excel to perform the regression and correlation analysis to answer the following. The week 6 spreadsheet can be helpful in this work. 1. Generate a scatterplot for the specified dependent variable (Y) and the selected independent variable (X), including the graph of the “best fit” line. Interpret. 2. Determine the equation of the “best fit” line, which describes the relationship between the dependent variable and the selected independent variable. 3. Determine the correlation coefficient. Interpret. 4. Determine the coefficient of determination. Interpret. 5. Test the utility of this regression model by completing a hypothesis test of b=0 using α=0.10. Interpret results, including the p-value. 6. Based on the findings in steps 1-5, analyze the ability of the independent variable to predict the dependent variable. 7. Compute the confidence interval for b, using a 95% confidence level. Interpret this interval. 8. Compute the 99% confidence interval for the dependent variable, for a selected value of the independent variable. Each student can choose a value to use for the independent variable (use same value in the next step). Interpret this interval. 9. Using the same chosen value for part (8), estimate the 99% prediction interval for the dependent variable. Interpret this interval. 10. What can be said about the value of the dependent variable for values of the independent variable that are outside the range of the sample values? Explain. 11. Describe a business decision that could be made based on the results of this analysis. In other words, how might the business operations change based on these statistical results. 12. See grading rubric below.
Summarize your results from Steps 1-11 in a 3-page report. The report should explain and interpret the results in ways that are understandable to someone who does not know statistics.
Submission: The Word document, summary report should be submitted for questions 1-11. The Excel output can be included as an appendix, if needed.
Format for report:
A. Summary Report B. Steps 1-11 addressed with appropriate output, graphs and interpretations. Be sure to number each step 1-11.
Final Project: Grading Rubric
Category Points % Description Steps 1-11 worth 10 pts. each 110 points 85% Addressed with appropriate output, graphs and interpretations Communication Skills 20 points 15% Writing, grammar, clarity, logic, and cohesiveness Totals 130 points 100% A quality paper will meet or exceed all of the above requirements Part 1 Generate a scatterplot for the specified dependent variable (Y) and the X1 independent variable, including the graph of the “best fit” line. Interpret. Part 2 Determine the equation of the “best fit” line, which describes the relationship between the dependent variable and the selected independent variable. Part 3 Determine the coefficient of correlation. Interpret. Part 4 Determine the coefficient of determination. Interpret. Part 5 Test the utility of this regression model. Interpret results, including the p-value. Part 6 Based on the findings in Steps 1-5, analyze the ability of the independent variable to predict the designated dependent variable. Part 7 Compute the confidence interval for β1 (the population slope) using a 95% confidence level. Interpret this interval. Part 8 Using an interval, estimate the average for the dependent variable for a selected value of the independent variable. Interpret this interval. Part 9 Using an interval, predict the particular value of the dependent variable for a selected value of the independent variable. Interpret this interval. Part10 What can be said about the value of the dependent variable for values of the independent variable that are outside the range of the sample values? Explain. Part 11 Describe a business decision that could be made based on the results of this analysis. In other words, how might the business operations change based on these statistical results.
2
Sales (Y)
Calls (X
1
Time (X2)
Years (X
3
Type
21
116
19.9
NONE
21 116
18.
0
GROUP
31
118
19.
4
ONLINE
33
12
5
16.9
32
1
29
18.5
29
131
14.6
33 131
10.0
42
19.4
40
132
33
1
35
15.3
33
136
41
19.3
45
137
15.7
42
139
12.4
33
140
17.5
41 140
13.9
42
141
12.2
42
142
17.0
35
1
43
17.7
40 143
18.3
32
1
46
15.4
34
15.8
40 146
18.2
43 146
16.4
44
16.5
45 146
15.6
41
1
47
42 147
13.2
42
1
48
10.5
40
1
49
14.2
41 149 13.2 3 GROUP
43 149
12.7
32
1
50
43 150
17.4
46 150
20.7
41
1
51
18.0
43 151
14.3
44 151
15.2
43
1
52
17.2
44 152
16.0
43
1
53
46 153
22.0
40
1
54
41 154
14.5
43 154 15.3 1 ONLINE
43
155
11.2
42
156
18.6
43 156
20.5
45
157
16.3
46 157 19.3 2 ONLINE
44
158
44 158
11.8
42
159
13.6
48
160
46
161
46
162
47 162 16.4 3 ONLINE
43
163
11.7
45
164
44
165
44 165
15.0
44 165
19.2
45
166
19.5
45
167
46 167
10.1
46 167
14.8
46 167 15.8 0 ONLINE
48
168
12.3
48 168
15.9
46
169
48
170
12.1
45
171
47 171
17.3
45
172
12.6
50
173
13.3
49
174
45
175
47 175 13.9 2 GROUP
50 175
15.1
51 175
12.0
54 175 14.2 1 ONLINE
44
177
48
178
50
179
12.8
46
180
49
181
51 181
11.4
51
182
17.9
47
183
13.0
50 183 11.7 1 GROUP
47
185
50 185 16.4 0 ONLINE
46
186
14.1
49 186 17.5 1 GROUP
53
188
11.0
46
189
20.0
52
190
49
191
13.1
54
195
51
198