Linear Project

 

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Instructions

Instructions are found in the Linear Project module.

You may submit all of your project in one document or a combination of documents, which may consist of word processing documents or spreadsheets or scanned handwritten work, provided it is clearly labeled where each task can be found. Be sure to include your name. Projects are graded on the basis of completeness, correctness, ease in locating all of the items, and strength of the narrative portions.

Scatterplots,Linear Regression, and Correlation

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

When we have a set of data, often we would like to develop a model that fits the data.

First we graph the data points (x, y) to get a scatterplot. Take the data, determine an appropriate scale

on the horizontal axis and the vertical axis, and plot the points, carefully labeling the scale and axes.

Summer Olympics:

Men’s 400 Meter Dash

Winning Times

Year (x)

Time(y)

(seconds)

1948 46.20

1952 45.90

1956 46.70

1960 44.90

1964 45.10

1968 43.80

1972 44.66

1976 44.26

1980 44.60

1984 44.27

1988 43.87

1992 43.50

1996 43.49

2000 43.84

2004 44.00

2008 43.75

Burger Fat (x)

(grams)

Calories (y)

Wendy’s Single 20 420

BK Whopper Jr. 24 420

McDonald’s Big Mac 28 530
Wendy’s Big Bacon

Classic 30 580

Hardee’s The Works 30 530
McDonald’s Arch

Deluxe 34 610
BK King Double

Cheeseburger 39 640
Jack in the Box

Jumbo Jack 40 650

BK Big King 43 660

BK King Whopper 46 730
Data from 1997

If the scatterplot shows a relatively linear trend, we try to fit a linear model, to find a line of best fit.

We could pick two arbitrary data points and find the line through them, but that would not necessarily

provide a good linear model representative of all the data points.

A mathematical procedure that finds a line of “best fit” is called linear regression. This procedure is also

called the method of least squares, as it minimizes the sum of the squares of the deviations of the points

from the line. In MATH 107, we use software to find the regression line. (We can use Microsoft Excel, or

Open Office, or a hand-held calculator or an online calculator — more on this in the Technology Tips

topic.)

Linear regression software also typically reports parameters denoted by r or r
2
.

The real number r is called the correlation coefficient and provides a measure of the strength of the

linear relationship.

r is a real number between −1 and 1.

r = 1 indicates perfect positive correlation — the regression line has positive slope and all of the data

points are on the line.

r = −1 indicates perfect negative correlation — the regression line has negative slope and all of the

data points are on the line

The closer |r| is to 1, the stronger the linear correlation. If r = 0, there is no correlation at all. The

following examples provide a sense of what an r value indicates.

Source: The Basic Practice of Statistics, David S. Moore, page 108.

Notice that a positive r value is associated with an increasing trend and a negative r value is associated

with a decreasing trend. The strongest linear models have r values close to 1 or close to −1.

The nonnegative real number r
2
is called the coefficient of determination and is the square of the

correlation coefficient r.

Since 0 ≤ |r| ≤ 1, multiplying through by |r|, we have 0 ≤ |r|
2
≤ |r| and we know that −1 ≤ r ≤ 1.

So, 0 ≤ r
2
≤ 1. The closer r

2
is to 1, the stronger the indication of a linear relationship.

Some software packages (such as Excel) report r
2
, and so to get r, take the square root of r

2
and

determine the sign of r by observing the trend (+ for increasing, − for decreasing).

Page1 of 4

(Sample) Curve-Fitting Project – Linear Model: Men’s 400 Meter Dash Submitted by Suzanne Sands

(LR-1) Purpose: To analyze the winning times for the Olympic Men’s 400 Meter Dash using a linear model

Data: The winning times were retrieved from http://www.databaseolympics.com/sport/sportevent.htm?sp=ATH&enum=130

The winning times were gathered for the most recent 16 Summer Olympics, post-WWII. (More data was available, back to 1896.)

DATA:

Summer Olympics:

Men’s 400 Meter Dash

Winning Times

Year

Time

(seconds)

1948 46.20

1952 45.90

1956 46.70

1960 44.90

1964 45.10

1968 43.80

1972 44.66

1976 44.26

1980 44.60

1984 44.27

1988 43.87

1992 43.50

1996 43.49

2000 43.84

2004 44.00

2008 43.75

(LR-2) SCATTERPLOT:

As one would expect, the winning times generally show a downward trend, as stronger competition and training

methods result in faster speeds. The trend is somewhat linear.

43.00

43.50

44.00

44.50

45.00

45.50

46.00

46.50

47.00

1944 1952 1960

1968 1976 1984 1992 2000 2008

T
im

e
(

se
co

n
d

s)

Year

Summer Olympics: Men’s 400 Meter Dash Winning Times

Page 2 of 4

(LR-3)

Line of Best Fit (Regression Line)

y = −0.0431x + 129.84 where x = Year and y = Winning Time (in seconds)

(LR-4) The slope is −0.0431 and is negative since the winning times are generally decreasing.

The slope indicates that in general, the winning time decreases by 0.0431 second a year, and so the winning time decreases at an

average rate of 4(0.0431) = 0.1724 second each 4-year Olympic interval.

y = -0.0431x + 129.84

R² = 0.6991

43.00
43.50
44.00
44.50
45.00
45.50
46.00
46.50
47.00
1944 1952 1960 1968 1976 1984 1992 2000 2008
T
im
e
(
se
co
n
d
s)
Year
Summer Olympics: Men’s 400 Meter Dash Winning Times

Page 3 of 4

(LR-5) Values of r
2
and r:

r
2
= 0.6991

We know that the slope of the regression line is negative so the correlation coefficient r must be negative.

� = −√0.6991 = −0.84

Recall that r = −1 corresponds to perfect negative correlation, and so r = −0.84 indicates moderately strong negative correlation

(relatively close to -1 but not very strong).

(LR-6) Prediction: For the 2012 Summer Olympics, substitute x = 2012 to get y = −0.0431(2012) + 129.84 ≈ 43.1 seconds.

The regression line predicts a winning time of 43.1 seconds for the Men’s 400 Meter Dash in the 2012 Summer Olympics in London.

(LR-7) Narrative:

The data consisted of the winning times for the men’s 400m event in the Summer Olympics, for 1948 through 2008. The data exhibit

a moderately strong downward linear trend, looking overall at the 60 year period.

The regression line predicts a winning time of 43.1 seconds for the 2012 Summer Olympics, which would be nearly 0.4 second less

than the existing Olympic record of 43.49 seconds, quite a feat!

Will the regression line’s prediction be accurate? In the last two decades, there appears to be more of a cyclical (up and down)

trend. Could winning times continue to drop at the same average rate? Extensive searches for talented potential athletes and

improved full-time training methods can lead to decreased winning times, but ultimately, there will be a physical limit for humans.

Note that there were some unusual data points of 46.7 seconds in 1956 and 43.80 in 1968, which are far above and far below the

regression line.

If we restrict ourselves to looking just at the most recent winning times, beyond 1968, for Olympic winning times in 1972 and

beyond (10 winning times), we have the following scatterplot and regression line.

Page 4 of 4

Using the most recent ten winning times, our regression line is y = −0.025x + 93.834.

When x = 2012, the prediction is y = −0.025(2012) + 93.834 ≈ 43.5 seconds. This line predicts a winning time of 43.5 seconds for 2012 and

that would indicate an excellent time close to the existing record of 43.49 seconds, but not dramatically below it.

Note too that for r2 = 0.5351 and for the negatively sloping line, the correlation coefficient is � = −√0.5351 = −0.73, not as strong as when

we considered the time period going back to 1948. The most recent set of 10 winning times do not visually exhibit as strong a linear trend as the

set of 16 winning times dating back to 1948.

CONCLUSION:

I have examined two linear models, using different subsets of the Olympic winning times for the men’s 400 meter dash and both have

moderately strong negative correlation coefficients. One model uses data extending back to 1948 and predicts a winning time of 43.1 seconds

for the 2012 Olympics, and the other model uses data from the most recent 10 Olympic games and predicts 43.5 seconds. My guess is that 43.5

will be closer to the actual winning time. We will see what happens later this summer!

UPDATE: When the race was run in August, 2012, the winning time was 43.94 seconds.

y = -0.025x + 93.834

R² = 0.5351

43.40

43.60

43.80

44.00

44.20

44.40

44.60

44.80

1968 1976 1984 1992 2000 2008
T
im
e
(
se
co
n
d
s)
Year
Summer Olympics: Men’s 400 Meter Dash Winning Times

(Sample) Curve-Fitting Project – Linear Model: Men’s 400 Meter Dash Submitted by Suzanne Sands

(LR-1) Purpose: To analyze the winning times for the Olympic Men’s 400 Meter Dash using a linear model

Data: The winning times were retrieved from

http://www.databaseolympics.com/sport/sportevent.htm?sp=ATH&enum=130

The winning times were gathered for the most recent 16 Summer Olympics, post-WWII. (More data was available, back to 1896.)

DATA:

Summer Olympics:
Men’s 400 Meter Dash
Winning Times

Year

Time (seconds)

1948

46.20

1952

45.90

1956

46.70

1960

44.90

1964

45.10

1968

43.80

1972

44.66

1976

44.26

1980

44.60

1984

44.27

1988

43.87

1992

43.50

1996

43.49

2000

43.84

2004

44.00

2008

43.75

(LR-2) SCATTERPLOT:

As one would expect, the winning times generally show a downward trend, as stronger competition and training methods result in faster speeds. The trend is somewhat linear.

(LR-3)

Line of Best Fit (Regression Line)

y = 0.0431x + 129.84 where x = Year and y = Winning Time (in seconds)

(LR-4) The slope is 0.0431 and is negative since the winning times are generally decreasing.

The slope indicates that in general, the winning time decreases by 0.0431 second a year, and so the winning time decreases at an average rate of 4(0.0431) = 0.1724 second each 4-year Olympic interval.

(LR-5) Values of r2 and r:

r2 = 0.6991

We know that the slope of the regression line is negative so the correlation coefficient r must be negative.

Recall that r = 1 corresponds to perfect negative correlation, and so r = 0.84 indicates moderately strong negative correlation (relatively close to -1 but not very strong).

(LR-6) Prediction: For the 2012 Summer Olympics, substitute x = 2012 to get y = 0.0431(2012) + 129.84 43.1 seconds.

The regression line predicts a winning time of 43.1 seconds for the Men’s 400 Meter Dash in the 2012 Summer Olympics in London.

(LR-7) Narrative:

The data consisted of the winning times for the men’s 400m event in the Summer Olympics, for 1948 through 2008. The data exhibit a moderately strong downward linear trend, looking overall at the 60 year period.

The regression line predicts a winning time of 43.1 seconds for the 2012 Summer Olympics, which would be nearly 0.4 second less than the existing Olympic record of 43.49 seconds, quite a feat!

Will the regression line’s prediction be accurate? In the last two decades, there appears to be more of a cyclical (up and down) trend. Could winning times continue to drop at the same average rate? Extensive searches for talented potential athletes and improved full-time training methods can lead to decreased winning times, but ultimately, there will be a physical limit for humans.

Note that there were some unusual data points of 46.7 seconds in 1956 and 43.80 in 1968, which are far above and far below the regression line.

If we restrict ourselves to looking just at the most recent winning times, beyond 1968, for Olympic winning times in 1972 and beyond (10 winning times), we have the following scatterplot and regression line.

Using the most recent ten winning times, our regression line is y = 0.025x + 93.834.

When x = 2012, the prediction is y = 0.025(2012) + 93.834 43.5 seconds. This line predicts a winning time of 43.5 seconds for 2012 and that would indicate an excellent time close to the existing record of 43.49 seconds, but not dramatically below it.

Note too that for r2 = 0.5351 and for the negatively sloping line, the correlation coefficient is , not as strong as when we considered the time period going back to 1948. The most recent set of 10 winning times do not visually exhibit as strong a linear trend as the set of 16 winning times dating back to 1948.

CONCLUSION:

I have examined two linear models, using different subsets of the Olympic winning times for the men’s 400 meter dash and both have moderately strong negative correlation coefficients. One model uses data extending back to 1948 and predicts a winning time of 43.1 seconds for the 2012 Olympics, and the other model uses data from the most recent 10 Olympic games and predicts 43.5 seconds. My guess is that 43.5 will be closer to the actual winning time. We will see what happens later this summer!

UPDATE: When the race was run in August, 2012, the winning time was 43.94 seconds.

Summer Olympics: Men’s 400 Meter Dash Winning Times

Summer Olympics:
Men’s 400 Meter Dash
Winning Times
Time (seconds) 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 46.2 45.9 46.7 44.9 45.1 43.8 44.660000000000011 44.260000000000012 44.6 44.27 43.87 43.5 43.49 43.839999999999996 44 43.75 Year
Time (seconds)
Summer Olympics: Men’s 400 Meter Dash Winning Times
Summer Olympics:
Men’s 400 Meter Dash
Winning Times
Time (seconds) 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 46.2 45.9 46.7 44.9 45.1 43.8 44.660000000000011 44.260000000000012 44.6 44.27 43.87 43.5 43.49 43.839999999999996 44 43.75 Year
Time (seconds)
Summer Olympics: Men’s 400 Meter Dash Winning Times
1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 44.660000000000011 44.260000000000012 44.6 44.27 43.87 43.5 43.49 43.839999999999996 44 43.75 Year
Time (seconds)

Page 2 of 4

ProjectInformation/LinearModelInfo/LinearModel-TechnologyTips.html

To complete the Linear Model portion of the project, you will need to use technology (or hand-drawing) to create a scatterplot, find the regression line, plot the regression line, and find r and r2.

Below are some options, together with some videos. Each video is limited to 5 minutes or less. It takes a bit of time for the video to initially download. When playing the video, if you want to slow it down to read the text, hit the pause icon. (If you run the mouse over the bottom of the video screen, the video controls will appear.) You may need to adjust the volume.

The basic options are to:

    (1) Generate by hand and scan. 

    (2) Use a free online tool 

  Use the free Desmos calculator: See DesmosLinearRegressionGuide to view how to generate a scatterplot and carry out linear regression.

The result of the free tool might not be as nice looking as the Microsoft Excel version, but it is free, accurate and easy to use.

    (3) Use Microsoft Excel.

 Visit Scatterplot – Start (VIDEO) to see how to create a scatter plot using Microsoft Excel and format the axes.

Visit Scatterplot – Regression Line (VIDEO) to see how to add labels and title to the scatterplot, how to generate and graph the line of best fit (regression) and obtain the value of r2 in Microsoft Excel.

Using Excel to obtain precise values of slope m and y-intercept b of the regression line: Video,  Spreadsheet

    (4) Use Open Office.

    (5) Use a hand-held graphing calculator (See section 2.5 in your textbook for help with Texas Instruments hand-held calculators.)

The Linear Project Example uses Microsoft Excel.

ProjectInformation/LinearModelInfo/DesmosLinearRegressionGuide

Desmos Graphing Calculator and Linear Regression

You can use the free online Desmos Graphing Calculator to produce a scatterplot and find the regression line and correlation

Go to https://www.desmos.com/calculator and lau

Select “table” from the menu at the upper left.

Desmos Graphing Calculator and Linear Regression

You can use the free online Desmos Graphing Calculator to produce a scatterplot and find the regression line and correlation

unch the calculator.

Page 1 of 7

You can use the free online Desmos Graphing Calculator to produce a scatterplot and find the regression line and correlation coefficient.

Data for Project Example (Men’s 400 Meter Dash) has been entered. Regression help can be accessed via the “?” ic

Data for Project Example (Men’s 400 Meter Dash) has been entered. Regression help can be accessed via the “?” ic

Page 2 of 7

Data for Project Example (Men’s 400 Meter Dash) has been entered. Regression help can be accessed via the “?” icon.

Select “expression” from the menu at the upper left.

Page 3 of 7

Type y1 ~ mx1 + b and the values of r
2
, r, m, and b automatically appear.

automatically appear.

Page 4 of 7

Selecting the tool at the upper right, you can then adjust the scales on the x and y axes and create labels.

Selecting the tool at the upper right, you can then adjust the scales on the x and y axes and create labels.

Page 5 of 7

You can give your graph a name. In order to save your graph, sign in with a free account and click the share button. If you share the given link, then by

followiing the link, the graph can be opened and manipulated. If you click the Image button, then you can save the graph as a

. In order to save your graph, sign in with a free account and click the share button. If you share the given link, then by

followiing the link, the graph can be opened and manipulated. If you click the Image button, then you can save the graph as a

Page 6 of 7

. In order to save your graph, sign in with a free account and click the share button. If you share the given link, then by

followiing the link, the graph can be opened and manipulated. If you click the Image button, then you can save the graph as a file.

After clicking the Image button, you can view the graph as a stand

After clicking the Image button, you can view the graph as a stand-alone image, and select from several options to save.

Page 7 of 7

ProjectInformation/LinearModelInfo/ScatterPlotStart.swf

ProjectInformation/LinearModelInfo/ScatterPlotRegressionLineR-Squared.swf

ProjectInformation/LinearModelInfo/MSExcel-m-b-RegressionLine.swf

Calculate your order
Pages (275 words)
Standard price: $0.00
Client Reviews
4.9
Sitejabber
4.6
Trustpilot
4.8
Our Guarantees
100% Confidentiality
Information about customers is confidential and never disclosed to third parties.
Original Writing
We complete all papers from scratch. You can get a plagiarism report.
Timely Delivery
No missed deadlines – 97% of assignments are completed in time.
Money Back
If you're confident that a writer didn't follow your order details, ask for a refund.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
$0.00
Power up Your Academic Success with the
Team of Professionals. We’ve Got Your Back.
Power up Your Study Success with Experts We’ve Got Your Back.

Order your essay today and save 30% with the discount code ESSAYHELP