STATISTICS EXCEL: WEEK 3 URGENT!

 Statistics Excel Week 3 Assignment

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Attached

I need only Week 3 done now

I need in 6 hours from now maximum

2

>

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
D a

ta

ompa

idpoint

ge

another page to make changes

1

.2

6

5

8 0

0 M

2

.7

3

7 0

0 M

3

5

31

5 1

1

B

4

0

0 5.5 1 M E

5

.9

48

16 0 5.7 1 M D

6

36

0

1 M F

– Age in years

7

8 1 5.7 1 F C

8

.4

32 90 9 1

1 F A

– salary grade midpoint

9

1

67

100 10 0 4 1 M F

10

.6

23 30 80 7 1

1 F A

or

)

23

100

1

1 F A

12

57 52

22 0 4.5 0 M E

40 30 100 2 1 4.7 0 F C

14

23 32 90 12 1 6 1 F A

15 23

23 32 80 8 1

1 F A

16

.1

7

40

90 4 0 5.7 0 M C

17

.9

57 27 55 3 1 3 1 F E

18

.1

31 31 80 11 1

0 F B

19

.4

23 32 85 1 0

1 M A

.7

31 44 70 16 1 4.8 0 F B

21

67

95 13 0

1 M F

22

48 48 65 6 1 3.8 1 F D

23

23 36 65 6 1

0 F A

24

48 30 75 9 1 3.8 0 F D

23 41 70 4 0 4 0 M A

23 22 95 2 1

0 F A

27

40

80 7 0 3.9 1 M C

67 44 95 9 1 4.4 0 F F

29

67 52 95 5 0

0 M F

30

48

90 18 0 4.3 0 M D

31

23 29 60 4 1 3.9 1 F A

32

31 25 95 4 0 5.6 0 M B

62

57 35 90 9 0 5.5 1 M E

34

31 26 80 2 0 4.9 1 M B

35

0.994 23 23 90 4 1

0 F A

36

23 27 75 3 1 4.3 0 F A

37

23 22 95 2 1 6.2 0 F A

38

57 45 95 11 0 4.5 0 M E

31 27 90 6 1 5.5 0 F B

40

23 24 90 2 0 6.3 0 M A

41

.3

40 25 80 5 0 4.3 0 M C

42

23 32 100 8 1 5.7 1 F A

43

1.110 67 42 95 20 1 5.5 0 F F

44

1.078 57 45 90 16 0

1 M E

45

48 36 95 8 1 5.2 1 F D

57 39 75 20 0 3.9 1 M E

47

57 37 95 5 0 5.5 1 M E

48

57 34 90 11 1 5.3 1 F E

49

57 41 95 21 0

0 M E

50 63.5 1.114 57 38 80 12 0 4.6 0 M E
ID Salary C M A Performance Rating Service Gender Raise Degree Gender

1 Grade Do not manipuilate Data set on this page, copy

to
6 0 1.0

5 5

7 3 4 8 5.7 E The ongoing question that the weekly assignments will focus on is: Are males and females paid the same for equal work (under the Equal Pay Act)?
27 0.8

9 31 52 80 3.9 B Note: to simplfy the analysis, we will assume that jobs within each grade comprise equal work.
3

5.5 1.

14 30 75 3.6 F
56.1 0.9

85 57 42 10 16 The column labels in the table mean:
48 1.0

18 36 90 ID – Employee sample number Salary – Salary in thousands
74.1 1.106 67 70 12 4.5 Age Performance Rating – Appraisal rating (employee evaluation score)
42.2 1.0

55 40 32 100 Service – Years of service (rounded) Gender – 0 = male, 1 = female
21 0.9

29 23 5.8 Midpoint Raise – percent of last raise
77.1 1.

15 49 Grade – job/pay grade Degree (0= BS\BA 1 = MS)
22 0.983 4.7 Gender1 (

Male Female Compa-ratio – salary divided by midpoint
11 2

3.8 1.036 41 19 4.8
67.4 1.183 95
13 40.2 1.004
23.7 1.032
1.000 4.9
47 1.

17 44
65 1.156
37 1.197 5.6
24 1.0

62 4.6
20 34 1.120
7

4.4 1.110 43 6.3
55.7 1.161
24.7 1.074 3.3
53 1.104
25 2

4.3 1.056
26 23.5 1.021 6.2
41.7 1.043 35
28 77.2 1.152
77.7 1.1

60 5.4
47.7 0.994 45
23.9 1.0

38
27.1 0.875
33 1.088
27.6 0.890
22.9 5.3
22.4 0.975
23.4 1.017
65.8 1.155
39 35.1 1.131
24.8 1.078
50 1.257
23.1 1.006
74.4
61.4 5.2
48.5 1.010
46 59.4 1.042
63.5 1.114
66.4 1.165
60.9 1.068 6.6

Week 1

1

to columns T and U at the right.

light the mean, sample standard deviation, and range.

Male Female
2

Midpoint
3

Male Female

What is the normal curve z value for each midpoint within overall range?

4

For full credit, show the excel formulas in each cell rather than simply the numerical answer.

Male Female

5

Week 1: Descriptive Statistics, including Probability
While the lectures will examine our equal pay question from the compa-ratio viewpoint, our weekly assignments will focus on
examining the issue using the salary measur

e.
The purpose of this assignmnent is two fold:
1. Demonstrate mastery with Excel tools.
2. Develop descriptive statistics to help examine the question.
3. Interpret descriptive outcomes
The first issue in examining salary data to determine if we – as a company – are paying males and females equally for doing equal work is to develop some
descriptive statistics to give us something to make a preliminary decision on whether we have an issue or not.
Descriptive Statistics: Develop basic descriptive statistics for Salary
The first step in analyzing data sets is to find some summary descriptive statistics for key variables.
Suggestion: Copy the gender1 and salary columns from the Data ta

b
Then use Data Sort (by gender1) to get all the male and female salary values grouped together.
a. Use the Descriptive Statistics function in the Data Analysis tab Place Excel outcome in Cell K19
to develop the descriptive statistics summary for the overall
group’s overall salary. (Place K19 in output range.)
High
b. Using Fx (or formula) functions find the following (be sure to show the formula
and not just the value in each cell) asked for salary statistics for each gender:
Mean:
Sample Standard Deviation:
Range:
Develop a 5-number summary for the overall, male, and female SALARY variable.
For full credit, use the excel formulas in each cell rather than simply the numerical answer.
Overall Males Females
Max
3rd Q
1st Q
Min
Location Measures: comparing Male and Female midpoints to the overall Salary data range.
For full credit, show the excel formulas in each cell rather than simply the numerical answer.
Using the entire Salary range and the M and F midpoints found in Q2
a. What would each midpoint’s percentile rank be in the overall range? Use Excel’s =PERCENTRANK.EXC function
b. Use Excel’s =STANDARDIZE function
Probability Measures: comparing Male and Female midpoints to the overall Salary data range
Using the entire Salary range and the M and F midpoints found in Q2, find
a. The Empirical Probability of equaling or exceeding (=>) that value for Show the calculation formula = value/50 or =countif(range,”>=”&cell)/50
b. The Normal curve Prob of => that value for each group Use “=1-NORM.S.DIST” function
Note: be sure to use the ENTIRE salary range for part a when finding the probability.
Conclusions: What do you make of these results? Be sure to include findings from this week’s lectures as well.
In comparing the overall, male, and female outcomes, what relationship(s) see, to exist between the data sets?
Your findings:
The lecture’s related findings:
Overall conclusion:
What does this suggest about our equal pay for equal work question?

Week 2

1

a

b

Step 1:

– place test function in cell k10

2

a What is the data input ranged used for this question:

b

Why:
c. Step 1: Ho:
Ha:
Step 2: Significance (Alpha):
Step 3: Test Statistic and test:
Why this test?
Step 4: Decision rule:

Step 5:

Step 6: Conclusion and Interpretation
What is the p-value:

What is your decision: REJ or NOT reject the null?

Why?

3

a What is the data input ranged used for this question:

b Does this question need a one or two-tail hypothesis statement and test?
Why:
c. Step 1: Ho:
Ha:
Step 2: Significance (Alpha):
Step 3: Test Statistic and test:
Why this test?
Step 4: Decision rule:

Step 5:

Step 6: Conclusion and Interpretation
What is the p-value:

What is your decision: REJ or NOT reject the null?
Why?

4

Your findings:
The lecture’s related findings:
Overall conclusion:

Week 2: Identifying Significant Differences – part 1
To Ensure full credit for each question, you need to show how you got your results. This involves either showing where the data you used is located
or showing the excel formula in each cell. Be sure to copy the appropriate data columns from the data tab to the right for your use this week.
As with our examination of compa-ratio in the lecture, the first question we have about salary between the genders involves equality – are they the same or different?
What we do, depends upon our findings.
As with the compa-ratio lecture example, we want to examine salary variation within the groups – are they equal? Use Cell K10 for the Excel test outcome location.
What is the data input ranged used for this question:
Which is needed for this question: a one- or two-tail hypothesis statement and test ?
Answer:
Why:
c. Ho:
Ha:
Step 2: Significance (Alpha):
Step 3: Test Statistic and test:
Why this test?
Step 4: Decision rule:
Step 5: Conduct the test
Step 6: Conclusion and Interpretation
What is the p-value:
What is your decision: REJ or NOT reject the null?
Why?
What is your conclusion about the variance in the population for male and female salaries?
Once we know about variance quality, we can move on to means: Are male and female average salaries equal? Use Cell K35 for the Excel test outcome location.
(Regardless of the outcome of the above F-test, assume equal variances for this test.)
Does this question need a one or two-tail hypothesis statement and test?
Conduct the test – place test function in cell K35
What is your conclusion about the means in the population for male and female salaries?
Education is often a factor in pay differences.
Do employees with an advanced degree (degree = 1) have higher average salaries? Use Cell K60 for the Excel test outcome location.
Note: assume equal variance for the salaries in each degree for this question.
Conduct the test – place test function in cell K60
Is the t value in the t-distribution tail indicated by the arrow in the Ha claim?
What is your conclusion about the impact of education on average salaries?
Considering both the compa-ratio information from the lectures and your salary information, what conclusions can you reach about equal pay for equal work?
Why – what statistical results support this conclusion?

Week 3

A B C D E F

To Ensure full credit for each question, you need to show how you got your results. This involves either showing where the data you used is located

or showing the excel formula in each cell. Be sure to copy the appropriate data columns from the data tab to the right for your use this week.
1

a What is the data input ranged used for this question:

Ho:

Ha:
Step 2: Significance (Alpha):
Step 3: Test Statistic and test:
Why this test?
Step 4: Decision rule:

Step 5:

Step 6: Conclusion and Interpretation
What is the p-value:
What is your decision: REJ or NOT reject the null?
Why?

2

to High

Why?

B-E

3

a What is the data input ranged used for this question:

b. Step 1: Ho:
Ha:
Step 2: Significance (Alpha):

Step 3: Test Statistic and test:

Why this test? A B C D E F

Step 4: Decision rule: Male 0
Step 5:

Female 0

0 0 0 0 0 0 0

Step 6: Conclusion and Interpretation

What is the p-value: A B C D E F
What is your decision: REJ or NOT reject the null? Male 0
Why? Female 0
What is your conclusion about the means in the population for male and female salaries? Sum: 0 0 0 0 0 0 0
4

Your findings:
The lecture’s related findings:
Overall conclusion:
Why – what statistical results support this conclusion?

Week 3: Identifying Significant Differences – part 2 Data Input Table: Salary Range Groups
Group name:
List salaries within each grade
A good pay program will have different average salaries by grade. Is this the case for our company?
Use Cell K08 for the Excel test outcome location.
Note: assume equal variances for each grade, even though this may not be accurate, for purposes of this question.
b. Step 1:
Conduct the test – place test function in cell K08
What is your conclusion about the means in the population for grade salaries?
If the null hypothesis in question 1 was rejected, which pairs of means differ?
(Use the values from the ANOVA table to complete the follow table.)
Groups Compared Mean Dif

f. T value used +/- Term Low Difference Significant?
A-B
A-C
A-D
A-E
A-F
B-C
B-D
B-E
C-D
C-E
C-F
D-E
D-F
E-F
One issue in salary is the grade an employee is in – higher grades have higher salaries.
This suggests that one question to ask is if males and females are distributed in a similar pattern across the salary grades?
Use Cell K54 for the Excel test outcome location.
Place the actual distribution in the table below.
Sum
Conduct the test – place test function in cell K54
Sum:
Place the expected distribution in the table below.
What implications do this week’s analysis have for our equal pay question?

Week 4

To Ensure full credit for each question, you need to show how you got your results. This involves either showing where the data you used is located
or showing the excel formula in each cell. Be sure to copy the appropriate data columns from the data tab to the right for your use this week.

1

Use Cell K08 for the Excel test outcome location.

What is the data input ranged used for this question:

Are there any surprises – correlations you though would be significant and are not, or non significant correlations you thought would be?

2

a. What is the data input ranged used for this question:
b.

Ho:

Ha:

Step 2: Significance (Alpha):
Step 3: Test Statistic and test:
Why this test?
Step 4: Decision rule:
Step 5:

Step 6: Conclusion and Interpretation
What is the p-value:
What is your decision: REJ or NOT reject the null?
Why?
c.

Ho:
Ha:
Step 2: Significance (Alpha):
Step 3: Test Statistic and test:
Why this test?
Step 4: Decision rule:

Step 5: Conduct the test

Step 6: Conclusion and Interpretation

Midpoint Age

Raise Gender Degree

d.

e.

f.

3

4

Your findings:
The lecture’s related findings:
Overall conclusion:

5

Week 4: Identifying relationships – correlations and regression
What is the correlation between and among the interval/ratio level variables with salary? (Do not include compa-ratio in this question.)
a. Create the correlation table.
i.
ii. Create a correlation table in cell K08.
b. Technically, we should perform a hypothesis testing on each correlation to determine
if it is significant or not. However, we can be faithful to the process and save some
time by finding the minimum correlation that would result in a two tail rejection of the null.
We can then compare each correlation to this value, and those exceeding it (in either a
positive or negative direction) can be considered statistically significant.
i. What is the t-value we would use to cut off the two tails? T =
ii. What is the associated correlation value related to this t-value? r =
c. What variable(s) is(are) significantly correlated to salary?
d.
e. Why does or does not this information help answer our equal pay question?
Perform a regression analysis using salary as the dependent variable and all of the variables used in Q1. Add the
two dummy variables – gender and education – to your list of independent variables. Show the result, and interpret your findings by answering the following questions.
Suggestion: Add the dummy variables values to the right of the last data columns used for Q1.
What is the multiple regression equation predicting/explaining salary using all of our possible variables except compa-ratio?
Step 1: State the appropriate hypothesis statements: Use Cell M34 for the Excel test outcome location.
Conduct the test – place test function in cell M34
What is your conclusion about the factors influencing the population salary values?
If we rejected the null hypothesis, we need to test the significance of each of the variable coefficients.
Step 1: State the appropriate coefficient hypothesis statements: (Write a single pair, we will use it for each variable separately.)
Note, in this case the test has been performed and is part of the Regression output above.
Place the t and p-values in the following table
Identify your decision on rejecting the null for each variable. If you reject the null, place the coefficient in the table.
Perf. Rat. Seniority
t-value:
P-value:
Rejection Decision:
If Null is rejected, what is the variable’s coefficient value?
Using the intercept coefficient and only the significant variables, what is the equation?
Salary =
Is gender a significant factor in salary?
Regardless of statistical significance, who gets paid more with all other things being equal?
How do we know?
After considering the compa-ratio based results in the lectures and your salary based results, what else would you like to know
before answering our question on equal pay? Why?
Between the lecture results and your results, what is your answer to the question
of equal pay for equal work for males and females? Why?
What does regression analysis show us about analyzing complex measures?

2018c Canvas Lecture Week 3 – 1a

BUS 308 Week 3 Lecture 1

Examining Differences – Continued

Expected Outcomes

After reading this lecture, the student should be familiar with:

1. Issues around multiple testing
2. The basics of the Analysis of Variance test
3. Determining significant differences between group means
4. The basics of the Chi Square Distribution.

Overview

Last week, we found out ways to examine differences between a measure taken on two
groups (two-sample test situation) as well as comparing that measure to a standard (a one-sample
test situation). We looked at the F test which let us test for variance equality. We also looked at
the t-test which focused on testing for mean equality. We noted that the t-test had three distinct
versions, one for groups that had equal variances, one for groups that had unequal variances, and
one for data that was paired (two measures on the same subject, such as salary and midpoint for
each employee). We also looked at how the 2-sample unequal t-test could be used to use Excel
to perform a one-sample mean test against a standard or constant value. This week we expand
our tool kit to let us compare multiple groups for similar mean values.

A second tool will let us look at how data values are distributed – if graphed, would they
look the same? Different shapes or patterns often means the data sets differ in significant ways
that can help explain results.

Multiple Groups

As interesting as comparing two groups is, often it is a bit limiting as to what it tells us.
One obvious issue that we are missing in the comparisons made last week was equal work. This
idea is still somewhat hard to get a clear handle on. Typically, as we look at this issue, questions
arise about things such as performance appraisal ratings, education distribution, seniority impact,
etc.

Some of these can be tested with the tools introduced last week. We can see, for
example, if the performance rating average is the same for each gender. What we couldn’t do, at
this point however, is see if performance ratings differ by grade, do the more senior workers
perform relatively better? Is there a difference between ratings for each gender by grade level?
The same questions can be asked about seniority impact. This week will give us tools to expand
how we look at the clues hidden within the data set about equal pay for equal work.

ANOVA

So, let’s start taking a look at these questions. The first tool for this week is the Analysis
of Variance – ANOVA for short. ANOVA is often confusing for students; it says it analyzes
variance (which it does) but the purpose of an ANOVA test is to determine if the means of

different groups are the same! Now, so far, we have considered means and variance to be two
distinct characteristics of data sets; characteristics that are not related, yet here we are saying that
looking at one will give us insight into the other.

The reason is due to the way the variance is analyzed. Just as our detectives succeed by
looking at the clues and data in different ways, so does ANOVA. There are two key variances
that are examined with this test. The first, called Within Group variance, is the average variance
of the groups. ANOVA assumes the population(s) the samples are taken from have the same
variation, so this average is an estimate of the population variance.

The second is the variance of the entire group, Between Group Variation, as if all the
samples were from the same group. Here are exhibits showing two situations. In Exhibit A, the
groups are close together, in fact they are overlapping, and the means are obviously close to each
other. The Between Group variation (which would be from the data set that starts with the
orange group on the right and ends with the gray group on the left) is very close to the Within
Group (the average) variation for the three groups.

So, if we divide our estimate of the Between Group (overall) variation by the estimate of
our Within Group (average) variation, we would get a value close to 1, and certainly less than
about 1.5. Recalling the F statistic from last week, we could guess that there is not a significant
difference in the variation estimates. (Of course, with the statistical test we do not guess but
know if the result is significant or not.)

Look at three sample distributions in Exhibit A. Each has the same within group
variance, and the overall variance of the entire data set is not all that much larger than the
average of the three separate groups. This would give us an F relatively close to 1.00.

Exhibit A: No Significant Difference with Overall Variation

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-5 -4 -3 -2 -1 0 1 2 3 4 5

Exhibit B: Significant Difference with Overall Variation

Now, if we look at exhibit B, we see a different situation. Here the group distributions do
not overlap, and the means are quite different. If we were to divide the Between Group (overall)
variance by the Within Group (average) variance we would get a value quite a bit larger than the
value we calculated with the pervious samples, probably large enough to indicate a difference
between the within and between group variation estimates. And, again, we would examine this F
value for statistical significance.

This is essentially what ANOVA does; we will look at how and the output in the next
lecture. If the F statistic is statistically significant (the null hypothesis of no difference is
rejected), then we can say that the means are different. Neat!

So, why bother learning a new tool to test means? Why don’t we merely use multiple t-
tests to test each pair separately. Granted, it would take more time that doing a single test, but
with Excel that is not much of an issue. The best reason to use ANOVA is to ensure we do not
reduce our confidence in our results. If we use an alpha of 0.05, it is essentially saying we are
95% sure we made the right decision in rejecting the null. However, if we do even 3 t-tests on
related data, our confidence drops to the P(Decision 1 correct + Decision 2 correct + Decision 3
correct). As we recall from week 1, the probability of three events occurring is the product of
each event separately, or .95*.95*.95 = 0.857! And in comparing means for 6 groups (such as
means for the different grade levels), we have 16 comparisons which would reduce our overall
confidence that all decisions were correct to 44%. Not very good. Therefore, a single ANOVA
test is much better for our confidence in making the right decision than multiple T-tests.

The hypothesis testing procedure steps are set up in a similar fashion to what we did in
with the t-tests. There is a single approach to wording the null and alternate hypothesis
statements with ANOVA:

Ho: All means are equal

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-10 -5 0 5 10

Ha: At least one mean differs.

The reason for this is simple. No matter how many groups we are testing, if a single mean
differs, we will reject the null hypothesis. And, it can get cumbersome listing all possible
outcomes of one or more means differing for the alternate.

One issue remains for us if we reject the null of no differences among the mean, which
means are different? This is done by constructing what we can call, for now, difference
intervals. A difference interval will give us a range of values that the “real” difference between
two means could really be. Remember, since the means are from samples, they are close
approximations to the actual population mean, which might be a bit larger or smaller than any
given mean. These difference intervals will take into account the possible sampling error we
have. (How we do this will be discussed in lecture 2 for this week.).

A difference interval might be -2 to +1.8. This says that the actual difference when we
subtract one mean from another could be any value between -2 to +1.8. Since this interval says
the difference could be 0 (meaning the means could be the same), we would find this pair of
means to be not significantly different. If, however, our difference range was, for example, from
+1.8 to + 3.8 (the same range but all positive values), we would say the difference between the
means is significant as 0 is not within the range.

ANOVA is a very useful tool when we need to compare multiple groups. For example,
this can be used to see if average shipping costs are the same across multiple shippers. The
average time to fill open positions using different advertising approaches, or the associated costs
of each, can also be tested with this technique. With our equal pay issues, we can look at mean
equality across grades of variables such as compa-ratio, salary, performance rating, seniority, and
even raise.

Chi Square Tests

The ANOVA test somewhat relies upon the shape of the samples, both with our
assumption that each sample is normally distributed with an equal variance and with their
relative relationship (how close or distant they are). In many cases, we are concerned more with
the distribution of our variables than with other measures. In some cases, particularly with
nominal labels, distribution is all we can measure.

In our salary question, one issue that might impact our analysis is knowing if males and
females are distributed across the grades in a similar pattern. If not, then whichever gender holds
more higher-level jobs would obviously have higher salaries. While this might be an affirmative
action or possible discrimination issue, it is not an equal pay for equal work situation.

So, again, we have some data that we are looking at, but are not sure how to make the
decision if things are the same or not. And, just by examining means we cannot just look at the
data we have and tell anything about how the variables are distributed.

But, have no fear, statistics comes to our rescue! Examining distributions, or shapes, or
counts per group (all ways of describing the same data) is done using a version of the Chi Square
test; and, after setting up the data Excel does the work for us.

In comparing distributions, and we can do this with discrete (such as the number of
employees in each grade) variables or continuous variables (such as age or years of service
which can take any value within a range if measured precisely enough) that we divide into
ranges, we simply count how many are in each group or range. For something like the
distribution of gender by grades; simply count how many males and females are in each grade,
simple even if a bit tedious. For something like compa-ratio, we first set up the range values we
are interested in (such as .80 up to but not including .90, etc.), and then count how many values
fall within each group range.

These counts are displayed in tables, such as the following on gender distribution by
grade. The first is the distribution of employees by grade level for the entire sample, and the
second is the distribution by gender. The question we ask is for both kinds of tables is basically
the same, is the difference enough to be statistically significant or meaningfully different from
our comparison standard?

A B C D E F
Overall 15 7 5 5 12 6

A B C D E F
Male 3 3 3 2 10 4
Female 12 4 2 3 2 2

The answer to the question of whether the distributions are different enough, when using
the Chi Square test, depends with the group we are comparing the distribution with. When we
are dealing with a single row table, we need to decide what our comparison group or distribution
is. For example, we could decide to compare the existing distribution or shape against a claim
that the employees are spread out equally across the 6 grades with 50/6 = 8.33 employees in each
grade. Or we could decide to compare the existing distribution against a pyramid shape – a more
typical organization hierarchy, with the most employees at the lower grades (A and B) and fewer
at the top; for example, 17, 10, 8, 7, 5, 3. The expected frequency per cell does not need to be a
whole number. What is important is having some justification for the comparison distribution
we use.

When we have multi-row tables, such as the second example with 2 rows, the comparison
group is known or considered to be basically the average of the existing counts. We will get into
exactly how to set this up in the next lecture. In either case the comparison (or “expected”)
distribution needs to have the row and column total sums to be the same as the original or actual
counts.

The hypothesis claims for either chi square test are basically the same:

Ho: Variable counts are distributed as expected (a claim of no difference)

Ha: Variable counts are not distributed as expected (a claim that a difference exists)

Comparing distributions/shapes has a lot of uses in business. Manufacturing generally
produces parts that have some variation in key measures; we can use the Chi Square to see if the
distribution of these differences from the specification value is normally distributed, or if the
distribution is changing overtime (indicating something is changing – such as machine
tolerances). The author used this approach to compare the distribution/pattern of responses to
questions on an employee opinion survey between departments and the overall division.
Different response patterns suggested the issue was a departmental one while similar patterns
suggested that the division “owned” the results, indicating which group should develop ways to
improve the results.

Summary

This week we looked at two different tests, one that looks for mean differences among
two or more groups and one that looks for differences in patterns, distributions, or shapes in the
data set.

The Analysis of Variance (ANOVA) test uses the difference in variance between the
entire data set and the average variance of the groups to see if at least one mean differs. If so, the
construction of difference intervals will tell us which of the pairs of means actually differ.

The Chi Square tests look at patterns within data sets and lets us compare them to a
standard or to each other.

Both tests are found in the Data Analysis link in Excel and follow the same basic set-up
process as we saw with the F and t-tests last week.

If you have any questions on this material, please ask your instructor.

After finishing with this lecture, please go to the first discussion for the week, and engage
in a discussion with others in the class over the first couple of days before reading the second
lecture.

BUS308 W3 Lecture – 3A

BUS 308 Week 3 Lecture 3

Setting up ANOVA and Chi Square

Expected Outcomes

After reading this lecture, the student should know how to:

1. Set-up the data for an ANOVA analysis.
2. Set-up and perform an ANOVA test.
3. Set-up a table of mean differences.
4. Set-up and perform a Chi Square test.

Overview

Setting up the ANOVA test is quite similar to how the t and F tests were set up. The Chi
Square set-up is a bit more complex, as it is not found in the Data Analysis list of tools.

ANOVA

The set-up of ANOVA within Excel is very similar to how we set up the F and T tests
last week; place the data set in appropriate groups and then use the ANOVA input box. One
difference this week is that the Fx (or Formulas) list does not include an option for ANOVA, so
we need to use the Data | Analysis tools.

Data Set-up

Single Factor. As with the t-test, ANOVA has a couple of versions to select between.
Each is used to answer slightly different questions, and these will be examined below. The most
significant difference lies in the data table used for each version.

We will be working primarily with the ANOAV Single Factor, which deals with
examining possible differences between the means of a single variable within different groups.
A question of whether or not the mean compa-ratios are equal across the grades is an example of
the kind of question answered with this approach.

Question 1. Week 3’s first question is about salary mean equality across the grades. Our
lecture example will deal with compa-ratio mean equality across the grades. The set-up for the
Single Factor ANOVA we just went through assumed this. The initial steps in the hypothesis
testing process are similar to what we have done before:

Step 1: Ho: All Compa-Ratio means are equal across the grades

Ha: At least one compa-ratio mean differs

Notice that these are the standard ANOVA – Single factor null and alternate hypothesis
statements that identify the specific variable (compa-ratio) and statistic (mean) that we
are testing, and merely say “no difference” and “at least one differs.”

Step 2: Alpha = 0.05

Step 3: F statistic and Single Factor ANOVA; used to test multiple means

Step 4: Decision Rule: Reject Ho if the p-value < 0.05 Step 5: Conduct the test – place the test function in cell K08. As with the F and T tests, we need to group the data into distinct groups. For example, if we are going to test the compa-ratio mean across grades, then the data must be set-up in a table with grades across the top, as in the screen shot below. Note that as was done with the T and F test input data, the raw or initial data was listed and then sorted. Values were then copied into related groups; we used male and female groups for the F and t tests and grade groups for this test. Test Set-up. Go to the Data | Analysis and select ANOVA Single Factor gives us the following input screen. This is completed for our compa-ratio test. Notice that the entire table range, including the column labels, is entered into the Input box as a single entry. We do need to check the labels box, as Excel needs to be explicitly told that some of the data range is not numeric. Our normal alpha value of 0.05 is automatically filled in, but you can change this value. The last entry is where we want to the output table to start. As with the T and F tests, this cell is the upper left corner of the output and is given as K08 for question 1 this week. Clicking OK gives us the data output that we examined in Lecture 2 for this week. Here is a video on ANOVA: https://screencast-o-matic.com/watch/cb6jecIkLg Other ANOVA Versions Two-Factor. While we will not work with either of the two-factor forms, a brief explanation will help show the difference and usefulness of these forms. The ANOVA Two- Factor without replication allows us to test the means of two factors at once. An example of this kind of question might be are the compa-ratio means equal across grades when sorted by gender? The outcome of this test gives us the significance of each group (grade average and gender average) as if the other variable was held constant. In other words, it removes some of the variation on what we are measuring. A data set-up table for this version might look like this: A B C D E F Male Female The values in each cell would be a measure for each cell. For example, male salary in grade A. For situations where we have multiple values, we could use the average or median value. https://screencast-o-matic.com/watch/cb6jecIkLg For the with replication version, the more significant test is to see if the variables interact with each other rather than simply examining mean equality. This requires multiple data points A B C D E F Male Female The values in each cell would be measures for each group. For example, we could use the minimum, maximum, and mean for each grade and gender group. for each of the groups (females in grade C, for example). For more information on these versions of ANOVA, please go to some web-based statistics sites. The data input for the with and without replication are quite similar – the entire data input box including any top and side labels. Question 2. This question asks for the mean difference intervals so we can identify the significantly different grade means. The formula for developing the range to examine mean differences is: (mean1 – mean2) +/- t* sqrt(mse*(1/n1 + 1/n2)). Ok – breathe. Most of the values we need are in the ANVOA table, and Excel will let us set up a table and do all these actions one step at a time. The completed table was examined in Lecture 2, so let’s step back from the complex table and develop it one cell at a time. This is the same as the old adage “How do we eat an elephant? One bit at a time.” Before starting on the table, we need to recall where the different outcomes are from our ANVOA table. See the screenshot below – this is the same as in Lecture 2, with some of the values we will be using bolded for easy identification. Now, let’s take a look at setting up the values in the table. The following screen shot is of the same table, but different cells display the formulas used to create the values rather than the values. This can help us see the relationships. Let’s take a look at each column and see how the calculations are set up. Row 31 contains the names of the values we want in each column, starting with the groups we want to compare. Going down Column B, we simply list the grade pairs we will look at in each row, such as A-B (comparing grades A and B), etc. Just set up a convenient label telling us what the row refers to, something like A-B. Column C, labeled Mean Diff., is where set up our first values, the difference between the two means. The generic formula is =ABS(Mean1 – Mean2). • the ABS function provides the absolute value, always providing a positive difference and eliminating any negative signs (as if we always subtracted the smaller value from the larger value). This is not needed; the author just likes it. • The A-B row (row 32) shows =ABS($N$11 – N12). These cell references refer to the mean values located in the Summary table from our ANOVA results. Cell N11 refers to the mean of grade A, while cell N12 refers to the mean of grade B. • The next row contains the reference to grade A (N11) but changes the second reference to the location of the grade C mean (N13). Repeat this pattern all the way down the table, referencing the two grades being compared in each row. • Don’t worry the dollar signs right now, we will cover these after we have completed a full row of formulas. In column D, we have the t-value used to provide our confidence in the range outcomes. Since we are building our ranges based on the ANOVA results, the df for every row remains the same rather than changing with each pair of grades. The formula for finding a specific t-value based on a desired probability and df is “=T.INV.2T(alpha, df).” We are using the 2-tail value for t as we want to cut off values at other ends for our range, rather than just focusing on one end. Since we want a 95% interval (consistent with our alpha = 0.05), use .05 to tell Excel what percent to cut off from the extremes (0.025 on each tail) from the t distribution. The df for each pair is the df associated with the within groups variation, found in cell M23 and equaling 44. The resulting cell formula becomes: =T.INV.2T(0.05, $M$23). We can use the same copying approach to copy this value to the end of the table. Note: The lower the alpha used, the higher our level of confidence and the larger the range. A 100% confidence results in a range from – infinity to + infinity, of no help whatsoever. A larger alpha value gives us a smaller interval and less confidence that the range contains the actual difference of the means within the population. Column E develops our range constant that is added and subtracted to the mean. This is similar to a margin of error that we discussed earlier. The general formula cell entries in this row is: =t*SQRT(MSwg* (1/count1 + 1/count2)), where MSwg is the MS value for the Within Group row from the ANVOA table, and 1 and 2 refer to the groups being compared. For the comparison of Grades A and F shown in Row 36, the specific formula shows, =D36*SQRT($N$23 * (1/$l$12 + 1/L17)). • D36 refers to the T-value found in column D. (You could enter the actual t value, use an absolute reference to a single cell, or use the value in each row – they all work.) The SQRT is Excel’s code for taking the square root of whatever is within the ( ). • The $N$23 is the cell reference to the MSwg measure in the ANOVA table. This is the common variance estimate for the samples, so adding the $ makes sense. • The (1/$L$12 and 1/L17) are the references to the counts for grades A and E that are found in the Summary part of the ANOVA output. Now, let’s develop the ranges. The low-end value of the difference range (column F) equals the Mean Diff. (column C) minus the +/- term (column E), so the formula for row 31 would be =C31 – E31; for row 32, the values change to =C32 – E32, etc. The high-end value (column H) for the range equals column C + column E, or =C31 + E31, etc. We discussed how to interpret the significance of each interval in Lecture 2 and will not repeat that here. Now, to make things a bit easier. Notice the dollar signs around some of the cell references. For example, the dollar signs found in N12; these are made by typing N12 and then pressing F4. These tell Excel if we copy this cell keep N12 as a constant. Without these, copying the cell would change values we want to remain the same. What does this mean? If you want to try copying cells rather than writing the formula in each cell, try the following. • Using just cell C31, move the cursor to the bottom right corner of the cell. When it is place correctly at the corner, the cursor will change to a small +. • When you see the +, depress the left mouse button and pull the cursor down one cell to C32. • You should now see =($N$11 – N13) rather than =($N$11 – N12). The relative reference of cell N12 went down 1 row as you pulled the cell down one row. What this means is that after you set up the entire row 31 (from column C thru column I) you can highlight the entire range, place the cursor on the far-right corner, and after you see the + drag all of the cells down from row 31 to row 38, where we start to compare grade B. First, delete the mess in row 37, which is just a separator row. Then in cell C38, change the references to $N$12 and N13 (for grades B and C), do the same in cell E38 to the related counts in $L$12 and L13. Highlight and drag the range down to C42 and make the appropriate adjustments again. Do this until you have reached and edited the cells in row 49. You should now have all the table calculations done, and are ready to make your comparison decisions in columns J and K. Note when your cursor is on a cell value with an = in it, such as =Nll in a formula pressing F4 will place $ signs in front of both the row and cell. Pressing F4 a second time places the $ sign in front of the row value; pressing it a third time places the $ sign in front of the column value. Pressing it a fourth time removes all of the $ signs. Chi Square Tests This lecture will look at setting up two related Chi Square tests. The first, called the Goodness of Fit Test, involves a single row of counts, such as with the die example we discussed in the Lecture 2 for week 1. This form of the test would answer a question such as are the dice we tossed fair – that is did we get the distribution for each face that we expected? The second is called the Contingency Table analysis involves multiple rows in the table, such as we might have if we looked at how degrees (undergraduate and graduate) are distributed across the grades. Both Chi Square statistical values are calculated the same way. Both of these tests will use counts (how many) rather than the measurements (how much) we have been using to date. The Chi Square tests use the difference between an actual distribution/counts and an expected distribution to reach decisions on the similarity or difference in patterns. The Chi Square distribution examines the differences between what we see (actual counts per group) and what we expect in each group. Once we have these two counts, the actual calculation of the Chi Square statistic (which Excel can do for us automatically) is: ∑ (Observed count – Expected count)^2/(Expected count). This is simply the sum (∑) of the squared differences between what we saw and what we expected) divided by our expected count. The Chi Square statistic is also evaluated with a degree of freedom measure that varies with each test. The expected values are obviously critical to outcomes with this test, and they can be developed in several different ways if they are not already known. These approaches depend upon the complexity of the situation and will be discussed below. Two input tables are required for all Chi Square test set-ups. The first table is the “actual” or “observed” counts, a table showing how many items fit into each group we care about. The second is a table showing the expected counts. Example The assignment does not ask for a simple 1 row table of counts, a Goodness of Fit test; but we will start with this simple example first. In the goodness of fit test, our table is a single row showing the counts. Recall from week 1 that we looked at how many times each value from the showing faces of a pair of dice showed up when we tossed the pair of dice 50 times. We got the following distribution of scores. Outcomes from tossing a pair of dice Count showing 2 3 4 5 6 7 8 9 10 11 12 Frequency seen 1 2 4 3 9 12 7 5 4 1 2 In the language of a Chi Square test, the frequency seen row would be called the “Actual” data, it is simply the count of how many we see that fit any criteria, such as sum of dots on the showing faces of the dice. Typically, the Actual counts are easy to get, simply count what is seen. The “Expected” counts are sometimes harder figure out. For example, what is the expected number of 2’s when we toss the dice 50 times? Why? We could say we expect each value to occur the same number of times and use 50/11 (number of possible outcomes) as the expected value. In some situations, this would be fine (note: expected values do not need to be whole numbers). In this case, that is probably not the best choice. Fortunately, probability theory can give us an answer. There are 36 possible outcome combinations – we have 6 outcomes for die 2 for each of the 6 outcomes on die 1; 6 * 6 = 36. So for a run of 36 tosses, a “perfect” distribution showing each of the possible outcomes would look like: Count showing 2 3 4 5 6 7 8 9 10 11 12 Expected 1 2 3 4 5 6 5 4 3 2 1 To translate this to a run of 50, we would multiply each frequency by 50/36. So our Expected outcome would look like (rounded to 2 decimal points): Count showing 2 3 4 5 6 7 8 9 10 11 12 Actual 1 2 4 3 9 12 7 5 4 1 2 Expected 1.39 2.78 4.17 5.56 6.94 8.33 6.94 5.56 4.17 2.78 1.39 Going to the Fx Statistical list and picking CHISQ.TEST(actual range, expected range), we get a value of 0.877. This is the probability of getting a value up to what we have. Since we are interested in the probability of getting a value as large or larger, to get the p-value we use =CHISQ.TEST(actual range, expected range) (this result is our p-value). So, if we were testing a null hypothesis of No difference from Expected, we would not reject this null. Based on these 50 tosses, the dice cannot be said to be unfair or biased. You could calculate the Chi Square statistic long hand; for this example it would be: Chi = ((1-1.39)^2)/1.39 + ((2 – 2.78)^2)/2.78 + … + ((2-1.39)^2)/1.39 = 5.2. The Chi Square df for a single row table is (number of cells – 1) or (11 – 1) = 10 for this example. Now, Excel can find the Chi Square value using the p-value found from CHISQ.TEST by using CHISQ.INV.RT(probability, df). Since we have the p-value which is the probability in the right tail of our distributions, we use the RT tail of the Chi Square distribution to find the cut-off value of 5.2 = CHISQ.INV.RT(0.877,10) = 5.2. Example – Question 3 The third question for this week asks about employee grade distribution. We are concerned here about the possible impact of an uneven distribution of males and females in grades and how this might impact average salaries. If employees are not distributed in a similar pattern, we can expect that this grade difference could be a factor in the observed salary difference. While we are concerned about an uneven distribution, our null hypothesis is always about equality, so the null would respond to a question such as are males and females distributed across the grades in a similar pattern; that is, we are either males or females more likely to be in some grades rather than others. A similar question can be asked about degrees, are graduate and undergraduate degrees distributed across grades in a similar pattern? If not, this might be part of the cause for unequal salary averages. The data for this test would be found in a contingency table with rows showing the degree and columns showing grades. Set-up of this table is fairly simple and involves copying the variables we want (grade and Deg, in this example), sorting them by grade and then Deg, and simply counting how many fit each cell (degree – grade match). Our final actual count table is shown below. Deg Grade 0 A Place the actual distribution in the table below. 0 A A B C D E F Total 0 A UnderG 7 5 3 2 5 3 25 0 A Grad 8 2 2 3 7 3 25 0 A Total 15 7 5 5 12 6 50 0 A The second table for each form is the expected value table. It will have the same row and column totals as the actual table has. This is an important check to ensure that the tables are set up correctly. The set-up of the Contingency Table Expected values is slightly more complicated than for the Goodness-of-Fit expected table. In general, we do not have a specific expected frequency count for these tables, so we need to create them using the information available to us from the Actual table. For each cell in the Expected table, we multiply its row total times its column total and divide by the grand total (50). For example, in the above table, the expected entry for Grad in grade D would be the Grad total (25) times the Grade D total (5) divided by the grand total (50); this gives us 25*5/50 = 2.5 for that cell. We can use the cell formulas shown below to create the first column values, and drag them across the rows thru grades B to F. See the screen print below. Now that we have our data tables created, we can look at performing the Chi Square Contingency Table analysis using the hypothesis testing procedure. Step 1: Ho: Grad and Undergrad degrees are distributed in a similar fashion. Ha: Grad and Undergrad degrees are not distributed in a similar fashion. (Note that an alternate wording could be that Degrees and grades are unrelated (not correlated) versus the alternate that they are significantly correlated. Both interpretations are appropriate for the contingency table test.) Step 2: Alpha = 0.05 Step 3: Chi Square statistic and Contingency table test, used for count data Step 4: Decision Rule: Reject the null hypothesis if the p-value is < 0.05. Step 5: Conduct the test. As with the F and T-tests, we use the Fx (or Formulas) list of statistical tools. The CHISQ.TEST function has inputs for the actual and Expected ranges and returns the p-value. This data entry is exactly the same as we saw in the F and T-test examples last week. The Chi square does not have a function listed in the Data | Analysis functions. We get a p-value of 0.85 (rounded) using =CHISQ.TEST(L58:Q59,L63:Q64). Note that the row and total column values are NOT included in the data ranges. (See the above screen print of the input tables.) Step 6: Conclusion and Interpretation What is the p-value? 0.85 Decision on rejecting the null: Do Not Reject the null hypothesis Why? P-value is > 0.05.

Conclusion on impact of degrees? Degrees are distributed equally across the grades and do not
seem to have any correlation with grades. This suggests they are not an important factor in
explaining differing salary averages among grades.

Here is a video on Chi Square: https://screencast-o-matic.com/watch/cb6jffIk8T

NOTE: There are some issues with both versions of the Chi Square test when we have
20% or more of the cells with expected values less than 5. In most cases, this presents a p-value
that is too small, potentially causing incorrect rejections of the null. There are conflicting
recommendations on what to do with this issue. Some say make what is called the Yates’
correction (do a search on this), others say combine columns to reduce the number of small cells,
and still others say just be aware of this if your rejection p-value is close to alpha. We are
choosing not to emphasize this issue, but merely leave it up to you to investigate if it becomes a
concern in your professional life.

Question 4

Having looked at grade mean differences for compa-ratios and educational degree
distribution, neither seems to help answer our equal pay question. The compa-ratios show that
not all of the grades have an equal average, with some senior grades having higher averages than

the lower grades. This could be due to poorly aligned midpoints (higher midpoints would lower
the average compa-ratios in those grades) or to a pattern of paying relatively more for the higher
graded work. We do not know right now. At any rate, since none of this week’s analysis
focused on gender, we have not really gained any additional insights into pay practices based on
gender.

Summary

In most respects setting up the ANOVA test is similar to what we did with the F and t-
tests. The principle difference lies with the number of columns we have. The input data table
for ANVOA should have multiple columns each headed by a group name (such as A, B, C, etc.
for our grades) with the data values for each group listed below (such as all grade A salaries
listed under the A label, etc.). The set-up window for ANVOA will have the entire data range
(labels and values) entered as a single range (such as G1:K12). ANOVA is found in the Data |
Analysis tab.

The set-up for the Chi Square tests is a bit more complicated as it involves not only the
actual data being set up in one table but also the expected values that are used for comparison
purposes being set up in a separate table. Both tables consist of counts rather than actual values
form the data set – for example, the number of employees in each grade.

The expected distribution table set differs depending upon which Chi Square test we are
doing. If we are comparing a single distribution (such as number of employees per grade), we
would set-up a single row expected table that matched the distribution we were concerned with;
possibly equal number in each grade, or a decreasing number in each grade such as a pyramid
might have, or more in the middle, etc.

If, however, we are looking at comparing several distributions, such as male and females
across the grades; the expected table is generated using the actual distribution. For each cell in
the expected table, we would find the value of the row total * the column total divided by the
grand total for the respective values in the actual table.

In both cases, the Chi Square set-up (found in the Fx or Formula links) asks us to identify
the range of the actual values and then the range of the expected values.

Please ask your instructor if you have any questions about this material.

When you have finished with this lecture, please respond to Discussion thread 3 for this
week with your initial response and responses to others over a couple of days before reading the
third lecture for the week.

BUS308 W3 Lecture – 2A

BUS308 Week 3 Lecture 2

Examining Differences – ANOVA and Chi Square

Expected Outcomes

After reading this lecture, the student should be familiar with:

1. Conducting hypothesis tests with the ANVOA and Chi Square tests
2. How to interpret the Analysis of Variance test output
3. How to interpret Determining significant differences between group means
4. The basics of the Chi Square Distribution.

Overview

This week we introduced the ANOVA test for multiple mean equality and the Chi Square
tests for distributions. This lecture will focus on interpreting the outcomes of both tests. The
process of setting them up will be covered in Lecture 3 for this week.

ANOVA

Hypothesis Test

The week 3 question 1 asks if the average salary per grade is equal? While this might
seem like a no-brainer (we expect each grade to have higher average salaries), we need to test all
assumed relationships. This is much like our detectives saying “we need to exclude you from the
suspect pool; where were you last night?” This example will, of course use the compa-ratio
instead of the salary values you will use in the homework.

The ANOVA test is found in the Data | Analysis tab.

Step 5 in the hypothesis testing process asks us to “Perform the test.” Here is a screen
shot of the ANOVA output for a test of the null hypothesis: “All grade compa-ratio means are
equal.” For this question we will be using the ANOVA-Single Factor option as we are testing
mean equality for a single factor, Grades. We will briefly cover the other ANOVA options in
Lecture 3 for this week.

Note that The ANOVA single factor output includes the test name, a summary table, and
an ANOVA table. The summary table that gives us the count, sum, average, and variance for the
compa-ratios by the analysis groups (in this case our grades). Note that we are assuming equal
variances within the grades within the population for this example, and your assignment. This
may not actually be true for this example (note the values in the Variance column), but we will
ignore this for now. ANOVA is somewhat robust around violations on the variance equality
assumption – means it may still produce acceptable results with unequal variances. There is a
non-parametric alternate if the variances are too different, but we do not cover it in this course.
Please note that the column and row values are present in this screenshot. These will be needed
as references in question 2.

The next table is the meat of the test. While for all practical purposes, we are only
interested in the highlighted p-value, knowing what the other values are is helpful. When we
introduced ANVOA in lecture 1, we discussed the between and within groups variation. As you
recall, the between groups focused on the data set as a single group and not distinct groups. For
the Between Groups row, we have an Sum of Squares (SS) value, which is a raw estimate of the
variation that exists. The degrees of freedom (df) for Between Groups equals the number of
groups (k) we have minus 1 (k-1), which equals 5 for our 6 groups. The Mean Square variation
estimate equals the SS divided by the df.

The Within Group focuses on the average variation for all our groups. SS gives us the
same raw estimate as for the BG row. The df for Within Groups is the total count (N) minus the
number of groups (N-k), or 44 for our 50 employees in the 6 groups. MSwg equals SS/df.

The F statistic is calculated by dividing the MSbg by MSwg. The next column gives us
our p-value followed by the critical value of F (when the p-value would be exactly 0.05). The
total line is the sum of the SS values and the overall df which equal the total count -1 (N – 1).

(As with the t and F tests, we could make our decision by comparing the calculated F
value (in cell O20, with critical value of F in cell Q20. We reject the null when the calculated F
is greater than the critical F. The critical value of F or any statistic in an Excel output table is the
value that exactly provides a p-value equaling our selected value for Alpha. However, we will
continue to use the P-value in our decisions.)

Now that we have our test results, we can complete step 6 of the hypothesis testing
procedure.

Step 6: Conclusions and Interpretation

What is the p-value? Found in the table, it is 0.0186 (rounded).

(Side note: at times Excel will produce a p-value that looks something like 3.8E-14. This
is called the scientific or exponential format. It is the same as writing 3.8 * 10-14 and
equals 0.000000000000038. A simple way of knowing how many 0s go between the
decimal point and the first non-zero number is to subtract 1 from the E value, so with E-
14, we have 13 zeros. At any rate, any Excel p-value using E-xx format will always be
less than 0.05.)

Decision: Reject the null hypothesis.

Why? P-value is less than 0.05.

Conclusion: at least one mean differs across the grades.

Question 2: Group Comparisons

Now that we know at least one grade compa-ratio mean is not equal to the rest, we need
to determine which mean(s) differ. We do this by creating ranges of the possible difference in
the population mean values. Remember, that our sample results are only a close approximation
of the actual population mean. We can estimate the range of values that the population mean
actually equals (remember that discussion of the sampling distribution of the mean from last
week). So, using the variation that exists in our groups, we estimate the range of differences
between means (the possible outcomes of subtracting one mean from another).

The following screen shot shows a completed comparison table for the grade related
compa-ratio means.

Let’s look at what this table tells us before focusing on how to develop the values
(covered in Lecture 3 for this week). Looking at the Groups Compared Column, we see the
comparison groups listed, A-B for grades A and B, A-C for grades A and C, etc. The next
column is the difference between the average compa-ratio values for each pair of grades. The T
value column is the value for a 95% two tail test for the degrees of freedom we have. (Lecture 3
discusses how to identify the correct value). Note that it is the same value for all of our
comparison groups, the explanation comes in Lecture 3.

The next column, labeled the +/- term, is the margin of error that exists for the mean
difference being examined. This is a function of sampling error that exists within each sample
mean. These are all of the values we need to create a range of values that represent, with a 95%
confidence, what the actual population mean differences are likely to be. We subtract this value
from the mean (in column B) to get our low-end estimate (Low column values), and we add it to
the mean to get our high-end estimate (High column values).

Now, we need to decide which of these ranges indicates a significantly different pair of
means (within the population) and which ranges indicate the likelihood of equal population
means (non-significant differences). This is fairly simple, if the range contains a 0 (that is, one
endpoint is negative and the other is positive), then the difference is not significant (since a mean
difference of 0 would never be significant). Notice in the table, that the A-B, A-C, and A-D
range all contain 0, and the results are not significant different. The A-E and A-F comparisons,
however have positive values for each end, and do not contain 0; these means are different in the
population.

We now know how to interpret an ANOVA table and an accompanying table of
differences for significant mean differences between and among groups.

Chi Square Tests

With the Chi Square tests, we are going to move from looking at population parameters,
such as means and standard deviations, and move to looking at patterns or distributions. The

shape or distribution of variables is often an important way of identifying differences that could
be important. For example, we already suspect that males and females are not distributed across
the grades in a similar manner. We will confirm or refute this idea in the weekly assignment.

Generally, when looking at distributions and patterns we can create groups within our
variable of interest. For example, the Grades variable is already divided into 6 groups, making it
easy to count how many employees exist in each group. But what about a continuous variable
such as Compa-ratio, where no such clear division into separate groups exists. This is not a
problem as we can always divide any range of values into groups such as quartiles (4 groups) or
any other number of distinct ranges. Most variables can be subdivided this way.

The Chi Square test is actually a group of comparisons that depend upon the size of the
table the data is displayed in. We will examine different tables and tests in Lecture 3, for this
lecture we want to focus on how to interpret the outcome of a Chi Square test – as outcomes are
the same regardless of the table size. The details of setting up the data will be covered in Lecture
3.

Example – Question 3

The third question for this week asks about employee grade distribution. We are
concerned here about the possible impact of an uneven distribution of males and females in
grades and how this might impact average salaries. While we are concerned about an uneven
distribution, our null hypothesis is always about equality, so the null would respond to a question
such as are males and females distributed across the grades in a similar pattern; that is, we are
either males or females more likely to be in some grades rather than others.

A similar question can be asked about degrees, are graduate and undergraduate degrees
distributed across grades in a similar pattern? If not, this might be part of the cause for unequal
salary averages.

The step 5 output for a Chi Square test is very simple, it is the p-value, the probability of
getting a chi square value as large or larger than what we see if the null hypothesis is true.
That’s it – the data is set up, the Chi Square test function is selected from the Fx statistical list,
and we have the p-value. There is not output table to examine.

So, for an examination of are degrees distributed across grades in a similar manner, we
would have an actual distribution table (counts of what exists) looking like this:

Place the actual distribution in the table below.
A B C D E F Total

UnderG 7 5 3 2 5 3 25
Grad 8 2 2 3 7 3 25
Total 15 7 5 5 12 6 50

This table would be compared to an expected table where we show what we expect if the null
hypothesis was correct. (Setting up this table is discussed in Lecture 3.) Then we just get our
answer.

So, steps 5 and 6 would look like:

Step 5: Conduct the test. 0.85 (the Chi Square p-value from the Chisq.Test function

Step 6: Conclusion and Interpretation

What is the p-value? 0.85

Decision on rejecting the null: Do Not Reject the null hypothesis.

Why? P-value is > 0.05.

Conclusion on impact of degrees? Degrees are distributed equally across the grades
and do not seem to have any correlation with grades. This suggests they are not an
important factor in explaining differing salary averages among grades.

Of course, a bit more of getting the Chi Square result depends on the data set up than
with the other tests, but the overall interpretation is quite similar – does the p-value indicate we
should reject or not reject the null hypothesis claim as a description of the population?

Summary

Both the ANOVA and Chi Square tests follow the same basic logic developed last week
with the F and t-tests. The analysis is started with developing the first four (4) hypothesis testing
steps which set-up the purpose and decision-making rules for the analysis.

Running the tests (step 5) will be covered in the third lecture for this week.

Step 6 (Interpretation) is also done in the same fashion as last week. Look for the p-value
for each test and compare it to the alpha criteria. If the p-value is less than alpha, we reject the
null hypothesis.

When the null is rejected in the ANOVA test, we then create difference intervals to
determine which pair of means differs. If any of these intervals contains the value 0 (meaning
one end is a negative value and the other is a positive value), we can say that those means are not
significantly different within the population.

The Chi Square has two tests that were presented. One test looks at a single group
compared to an expected distribution, which we provide. The other version compares two or
more groups to an expected distribution which is generated by the existing distributions. How
these “expected” tables are generated will be discussed in Lecture 3 for this week.

Please ask your instructor if you have any questions about this material.

When you have finished with this lecture, please respond to Discussion thread 2 for this
week with your initial response and responses to others over a couple of days.

Calculate your order
Pages (275 words)
Standard price: $0.00
Client Reviews
4.9
Sitejabber
4.6
Trustpilot
4.8
Our Guarantees
100% Confidentiality
Information about customers is confidential and never disclosed to third parties.
Original Writing
We complete all papers from scratch. You can get a plagiarism report.
Timely Delivery
No missed deadlines – 97% of assignments are completed in time.
Money Back
If you're confident that a writer didn't follow your order details, ask for a refund.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
$0.00
Power up Your Academic Success with the
Team of Professionals. We’ve Got Your Back.
Power up Your Study Success with Experts We’ve Got Your Back.

Order your essay today and save 30% with the discount code ESSAYHELP