psychology 302
Do the problems in the file
Psychology302, Winter 2020
Problem Set 2, due Wednesday, Feb 5 IN CLASS
Do all problems except #2 by hand and show your work
1. A researcher wants to study the relationship between extraversion and amount
of social interaction. She administers a measure of extraversion that ranges
from 1-20, where higher scores mean higher extraversion. She then observes
the number of social interactions between participants in a 30 minute period in
the lab. The data are as follows:
Participant
Extraversion score (X)
Number of social
interactions (Y)
1 20 7
2 5 2
3 18 9
4 6 3
5 19 8
6 8 6
7 15 3
8 7 4
9 17 7
10 16 10
Mean: 13.10 5.90
SD: 5.59 2.62
a. Draw a scatterplot for the data
b. Calculate the correlation of X and Y using the formula I provided in
class
c. What proportion of variance in number of social interactions is
explained by extraversion score?
d. Fully describe this relationship in English (not stats-speak). That
means say something about the direction and strength of the
relationship and what it means.
2. Re-do problem #1 (a) and (b) using SPSS. Follow the guidelines in your
handout to make an attractive scatterplot. Hand in your SPSS output and circle
the values of r on the output.
3. Given the following set of paired scores for 5 subjects:
Sub ID: 1 2 3 4 5
X 6 8 4 8 7
Y 5 6 9 9 11
a. Construct a scatter plot for the data
b. Compute the value of the correlation coefficient
c. Add the following set of scores from a sixth subject to the data: X =
24, Y = 26
d. Add the new point to your scatterplot from (a) (or make a new plot)
e. Compute the correlation for the set of six paired scores
f. Explain why there is such a big difference between (b) and (d).
4. As the value of a correlation approaches r1.0 (compared to a correlation close to
zero), what does it indicate about the following:
a. The shape of the scatterplot
b. The variability of the Y scores at each value of X
c. The accuracy with which we can predict Y if X is known
5. A recent study reported a relationship between anger levels (X) and blood
pressure (Y) in college-age participants. You want to see whether this is true in
a sample of 12 friends. You use an anger measure that ranges from 10-100
where higher scores mean more anger. Systolic blood pressure (SBP) is
measured in millimeters of mercury, where higher values mean higher blood
pressure. I did a lot of the legwork for you on this one. Do the remaining
calculations by hand and show your work.
X Y ? − ? ? − ? ? − ? ? − ?
64 170 25.5 31.5 803.25
27 132 -11.5 -6.5 74.75
37 129 -1.5 -9.5 14.25
39 108 0.5 -30.5 -15.25
17 122 -21.5 -16.5 354.75
29 118 -9.5 -20.5 194.75
36 131 -2.5 -7.5 18.75
34 142 -4.5 3.5 -15.75
44 156 5.5 17.5 96.25
50 168 11.5 29.5 339.25
46 147 7.5 8.5 63.75
39 139 0.5 0.5 0.25
6 = 38.5 138.5 0 0 1929
Sx = 11.49 SY = 18.41
a. Calculate the correlation of anger level (X) and SBP (Y) using the formula I
provided in class
b. What proportion of variance in anger is explained by SBP?
c. Fully describe this relationship in English (not stats-speak). That means say
something about the direction and strength of the relationship and what it
means.
1
Scatterplots and
Correlation
Correlation
¾ Useful tool to assess relationships
¾ Must have two variables measured on one set of
people
¾ Correlation only measures strength of linear
association
Linear relationships are
not perfect lines
¾ Variables have variability (duh)
¾ Relationships may be generally linear
even if all points are not on the line
Magnitude of r Not all relationships are linear
2
Properties of r
¾ X & Y must be quantitative
z Interval or ratio
¾ I doe n ma e hich a iable i edic o
and which is response
z rxy = ryx
Properties of r
¾ Correlation has no units
z So r can be compared for different variables
¾ Value of r is always between -1 and +1
Computing r
¾ Consider deviations around mean of X & Y
¾ (X ?) (Y ?)
Cross-Product
¾ To consider X & Y together, multiply their
deviations
¾ (X ?)(Y ?)
¾ Sign will be positive or negative
¾ Sum of cross-products is an indication of
overall relationship
¾ S (X ?)(Y ?)
¾ Mostly positive, sum = positive
¾ Mostly negative, sum = negative
¾ About equal pos & neg, m 0
¾ Sum of cross-products can get very big if
N is large
¾ So le adj fo N
S (X ?)(Y ?)
?
¾ This is called the Covariance
3
Bill McDonald Speed Trap
Subject X (Age) Y (Speed) X- ? Y- ? (X ?)(Y ?)
1 18 48 -10 3 -30
2 24 51 -4 6 -2
4
3 43 41 15 -4 -60
4 34 39 6 -6 -36
5 21 46 -7 1 -7
Sum: 140 225 0 0 -157
Mean: 28 45
SD: 9.23 4.42
Just for this sample, not
estimating population
¾ Covariance
= -157/5 = 31.4
¾ What does that mean?
z Is it big?
z Is it small?
z What are the units?
¾ We need to get this on a scale that makes
sense!
¾ Le anda di e i
z Divide by standard deviation of X and Y
S (X ?)(Y ?)
?????
¾ When you standardize the covariance, you
ge
¾ ha o ge i r
¾ The units cancel out!
. .
= -.77
Cohen g ideline fo Corr
¾ These are for the behavioral sciences only
z ~.10 = small
z ~.30 = medium
z ~.50 = large
Interpret in words!
¾ Younger drivers tended to be driving faster
when pulled over.
¾ Watch wording!
z Do interpret direction
z Don e ca al lang age
Being o ng doe n CAUSE eo le o d i e fa
4
r as a measure of effect size
¾ Variance in a variable can be partitioned
z Explained + unexplained = total
John Venn
r2 is proportion of
variance explained
¾ rage,speed = -.77
¾ That means -.772 = .59 (59%) of variance in
speed is explained by or predicted by age
z Not necessarily caused by age
¾ 1 – .59 = .41
¾ 41% of the variability in speed is NOT
explained by age
Factors that affect size of r
¾ Biased samples
z Restricted range
tends to make corr smaller
z extreme groups
e.g., only use people really high or low on X
Tends to make corr larger
Design Issues
¾ N ho ld be e big ( 30)
z Small samples are too unstable
¾ Wide variability on X & Y
z No restriction of range
Factors that affect size of r
¾ Combined groups
z May mask relationship in each group
¾ r is very affected by outliers
1
Bivariate Regression
Straight Lines
¾ Simple way to describe a relationship
¾ Remember the equation for a straight line?
z y = mx + b
¾ What is m? What is b?
¾ How do you compute the equation?
(x1,y1)
(x2,y2)
What if every point is
not on the line?
¾ Straight line may be good description even
if not all points are on the line
Computing the line
when points are scattered
¾ = a + bX
¾ Y-hat means predicted value of Y
¾ Computing the slope:
¾ b = ?−? ?−?
?−?
¾ I ill ri e/r n, b no e al o
consider variability in X and Y
Computing the intercept
¾ a = – bX
¾ Need o pl g in al e of (X, )
¾ Can e j an Y or X!
z Line would be very different depending on
which ones you chose
¾ Must have X and Y that we know are on
the line
z mean of X and mean of Y
2
Computing the intercept
¾ Regression line will always go through the
mean of X and mean of Y
¾ A = ? – b?
¾ Le r it with our example from before
X
(# of kids)
Y
(hours of
housework) ? ? ? ? ? ? ? ? ? ?
1 1 -1.75 -2.5 4.375 3.063
1 2 -1.75 -1.5 2.625 3.063
1 3 -1.75 -0.5 0.875 3.063
2 6 -0.75 2.5 -1.875 0.563
2 4 -0.75 0.5 -0.375 0.563
2 1 -0.75 -2.5 1.875 0.563
3 5 0.25 1.5 0.375 0.063
3 0 0.25 -3.5 -0.875 0.063
4 6 1.25 2.5 3.125 1.563
4 3 1.25 -0.5 -0.625 1.563
5 7 2.25 3.5 7.875 5.063
5 4 2.25 0.5 1.125 5.063
MX=2.75 MY=3.5 = 0 = 0 = 18.5 = 24.25
Computing the equation
¾ b = .
.
.76
¾ a = 3.5 – .76(2.75)
¾ = 1.41
¾ = 1.41 + .76X
Interpreting the coefficients
¾ Slope
z For a one unit increase in X, we predict a b
unit increase in Y
What does that mean for this study?
¾ Intercept
z The predicted value of Y when X = 0
What does that mean for this study?
Interpreting the coefficients
¾ Slope
z For each additional child, we predict
parents will do an additional .76 hours of
housework per day
¾ Intercept
z For a family with zero kids, we predict they
will do 1.41 hours of housework per day
Drawing the regression line
¾ Need to plot two points
z ?, ?
z Y-intercept