psychology 302

Do the problems in the file

Psychology302, Winter 2020

Problem Set 2, due Wednesday, Feb 5 IN CLASS

Do all problems except #2 by hand and show your work

1. A researcher wants to study the relationship between extraversion and amount
of social interaction. She administers a measure of extraversion that ranges
from 1-20, where higher scores mean higher extraversion. She then observes
the number of social interactions between participants in a 30 minute period in
the lab. The data are as follows:

Extraversion score (X)
Number of social
interactions (Y)

1 20 7
2 5 2
3 18 9
4 6 3
5 19 8
6 8 6
7 15 3
8 7 4
9 17 7
10 16 10

Mean: 13.10 5.90
SD: 5.59 2.62

a. Draw a scatterplot for the data
b. Calculate the correlation of X and Y using the formula I provided in

c. What proportion of variance in number of social interactions is

explained by extraversion score?
d. Fully describe this relationship in English (not stats-speak). That

means say something about the direction and strength of the
relationship and what it means.

2. Re-do problem #1 (a) and (b) using SPSS. Follow the guidelines in your

handout to make an attractive scatterplot. Hand in your SPSS output and circle
the values of r on the output.

3. Given the following set of paired scores for 5 subjects:

Sub ID: 1 2 3 4 5
X 6 8 4 8 7
Y 5 6 9 9 11

a. Construct a scatter plot for the data
b. Compute the value of the correlation coefficient
c. Add the following set of scores from a sixth subject to the data: X =

24, Y = 26
d. Add the new point to your scatterplot from (a) (or make a new plot)
e. Compute the correlation for the set of six paired scores
f. Explain why there is such a big difference between (b) and (d).

4. As the value of a correlation approaches r1.0 (compared to a correlation close to

zero), what does it indicate about the following:

a. The shape of the scatterplot
b. The variability of the Y scores at each value of X
c. The accuracy with which we can predict Y if X is known

5. A recent study reported a relationship between anger levels (X) and blood

pressure (Y) in college-age participants. You want to see whether this is true in
a sample of 12 friends. You use an anger measure that ranges from 10-100
where higher scores mean more anger. Systolic blood pressure (SBP) is
measured in millimeters of mercury, where higher values mean higher blood
pressure. I did a lot of the legwork for you on this one. Do the remaining
calculations by hand and show your work.

X Y ? − ? ? − ? ? − ? ? − ?
64 170 25.5 31.5 803.25
27 132 -11.5 -6.5 74.75
37 129 -1.5 -9.5 14.25
39 108 0.5 -30.5 -15.25
17 122 -21.5 -16.5 354.75
29 118 -9.5 -20.5 194.75
36 131 -2.5 -7.5 18.75
34 142 -4.5 3.5 -15.75
44 156 5.5 17.5 96.25
50 168 11.5 29.5 339.25
46 147 7.5 8.5 63.75
39 139 0.5 0.5 0.25

6 = 38.5 138.5 0 0 1929
Sx = 11.49 SY = 18.41

a. Calculate the correlation of anger level (X) and SBP (Y) using the formula I

provided in class
b. What proportion of variance in anger is explained by SBP?
c. Fully describe this relationship in English (not stats-speak). That means say

something about the direction and strength of the relationship and what it


Scatterplots and


¾ Useful tool to assess relationships
¾ Must have two variables measured on one set of


¾ Correlation only measures strength of linear

Linear relationships are
not perfect lines

¾ Variables have variability (duh)
¾ Relationships may be generally linear

even if all points are not on the line

Magnitude of r Not all relationships are linear


Properties of r

¾ X & Y must be quantitative
z Interval or ratio

¾ I doe n ma e hich a iable i edic o
and which is response
z rxy = ryx

Properties of r

¾ Correlation has no units
z So r can be compared for different variables

¾ Value of r is always between -1 and +1

Computing r

¾ Consider deviations around mean of X & Y
¾ (X ?) (Y ?)


¾ To consider X & Y together, multiply their

¾ (X ?)(Y ?)
¾ Sign will be positive or negative

¾ Sum of cross-products is an indication of
overall relationship

¾ S (X ?)(Y ?)

¾ Mostly positive, sum = positive
¾ Mostly negative, sum = negative
¾ About equal pos & neg, m 0

¾ Sum of cross-products can get very big if
N is large

¾ So le adj fo N

S (X ?)(Y ?)

¾ This is called the Covariance


Bill McDonald Speed Trap
Subject X (Age) Y (Speed) X- ? Y- ? (X ?)(Y ?)

1 18 48 -10 3 -30
2 24 51 -4 6 -2


3 43 41 15 -4 -60
4 34 39 6 -6 -36
5 21 46 -7 1 -7

Sum: 140 225 0 0 -157
Mean: 28 45
SD: 9.23 4.42

Just for this sample, not
estimating population

¾ Covariance
= -157/5 = 31.4

¾ What does that mean?
z Is it big?
z Is it small?
z What are the units?

¾ We need to get this on a scale that makes

¾ Le anda di e i
z Divide by standard deviation of X and Y

S (X ?)(Y ?)

¾ When you standardize the covariance, you

¾ ha o ge i r
¾ The units cancel out!

. .
= -.77

Cohen g ideline fo Corr

¾ These are for the behavioral sciences only
z ~.10 = small
z ~.30 = medium
z ~.50 = large

Interpret in words!

¾ Younger drivers tended to be driving faster
when pulled over.

¾ Watch wording!
z Do interpret direction
z Don e ca al lang age

Being o ng doe n CAUSE eo le o d i e fa


r as a measure of effect size

¾ Variance in a variable can be partitioned
z Explained + unexplained = total

John Venn

r2 is proportion of
variance explained

¾ rage,speed = -.77
¾ That means -.772 = .59 (59%) of variance in

speed is explained by or predicted by age
z Not necessarily caused by age

¾ 1 – .59 = .41
¾ 41% of the variability in speed is NOT

explained by age

Factors that affect size of r

¾ Biased samples
z Restricted range

tends to make corr smaller
z extreme groups

e.g., only use people really high or low on X
Tends to make corr larger

Design Issues

¾ N ho ld be e big ( 30)
z Small samples are too unstable

¾ Wide variability on X & Y
z No restriction of range

Factors that affect size of r

¾ Combined groups
z May mask relationship in each group

¾ r is very affected by outliers


Bivariate Regression

Straight Lines

¾ Simple way to describe a relationship
¾ Remember the equation for a straight line?

z y = mx + b
¾ What is m? What is b?

¾ How do you compute the equation?



What if every point is
not on the line?

¾ Straight line may be good description even
if not all points are on the line

Computing the line
when points are scattered

¾ = a + bX
¾ Y-hat means predicted value of Y
¾ Computing the slope:

¾ b = ?−? ?−?

¾ I ill ri e/r n, b no e al o
consider variability in X and Y

Computing the intercept

¾ a = – bX
¾ Need o pl g in al e of (X, )
¾ Can e j an Y or X!

z Line would be very different depending on
which ones you chose

¾ Must have X and Y that we know are on
the line
z mean of X and mean of Y


Computing the intercept

¾ Regression line will always go through the
mean of X and mean of Y

¾ A = ? – b?

¾ Le r it with our example from before

(# of kids)

(hours of

housework) ? ? ? ? ? ? ? ? ? ?
1 1 -1.75 -2.5 4.375 3.063

1 2 -1.75 -1.5 2.625 3.063

1 3 -1.75 -0.5 0.875 3.063

2 6 -0.75 2.5 -1.875 0.563

2 4 -0.75 0.5 -0.375 0.563

2 1 -0.75 -2.5 1.875 0.563

3 5 0.25 1.5 0.375 0.063

3 0 0.25 -3.5 -0.875 0.063

4 6 1.25 2.5 3.125 1.563

4 3 1.25 -0.5 -0.625 1.563

5 7 2.25 3.5 7.875 5.063

5 4 2.25 0.5 1.125 5.063

MX=2.75 MY=3.5 = 0 = 0 = 18.5 = 24.25

Computing the equation

¾ b = .


¾ a = 3.5 – .76(2.75)
¾ = 1.41

¾ = 1.41 + .76X

Interpreting the coefficients
¾ Slope

z For a one unit increase in X, we predict a b
unit increase in Y

What does that mean for this study?

¾ Intercept
z The predicted value of Y when X = 0

What does that mean for this study?
Interpreting the coefficients
¾ Slope

z For each additional child, we predict
parents will do an additional .76 hours of
housework per day

¾ Intercept
z For a family with zero kids, we predict they

will do 1.41 hours of housework per day

Drawing the regression line

¾ Need to plot two points
z ?, ?
z Y-intercept

