statistics
Case
study
The district of Springfield conducted an environmental study on freshwater reservoirs in its region. These include lakes, creeks, and public ponds. The study was instigated by recent concerns voiced by a local environmental protection group that fish in these reservoirs may have been contaminated by mercury that they are no longer safe for human consumption.
Mercury is a toxic metal that occurs naturally in the environment. At times, however, human activities may result in unnatural releases of mercury into water bodies, which could in turn enter fish. Consuming mercury-contaminated fish can lead to severe neurological and physiological disorders in humans.
Springfield’s officials identified 943 water reservoirs (including natural lakes) that have significant fisheries and are relatively accessible, based on information found in a previous survey carried out a decade ago. Of these, a simple random sample of 142 reservoirs were selected for the current study. Then, samples of fish were collected from only 122 reservoirs that contained a targeted group of predator fish species that the researchers are interested in. There are certain criteria that the researchers used for deciding the targeted fish species.
Fish were collected by angling, gill nets, trap nets, dip nets or beach seines. Up to 5 fish from the hierarchical order of preferred predator species were obtained. Care was taken to keep fish clean and free of contamination. In the laboratory, the fish fillet (muscle) of each fish was extracted and the fillets from each reservoir were ground up, combined and homogenised. Then, the tissue was subsampled to analyse the mercury levels.
In addition to collecting fish samples, the officials examined other possible factors that could contribute to elevated mercury levels in fish. They reckoned that this information could be useful for policy making by members of Springfield legislature.
Following completion of the field study, you were handed with a dataset containing 122 records of the studied reservoirs. Each record is described by the following variables:
Reservoir |
: name of reservoir |
Fish |
: number of fish sampled |
Mercury |
: mercury level from sampled fish in parts per million (ppm) |
Elevation |
: reservoir’s elevation (in feet) |
Drainage |
: drainage area (in square miles). Drainage area is the area of land which collects and drains |
the rainwater which falls on it, such as the area around a reservoir.
Surface Area : surface area of a reservoir (in acres)
Max. Depth : maximum depth of a reservoir (in feet)
RF |
: Runoff Factor. Runoff is the amount of rainwater or melted snow which flows into rivers and streams. Higher runoff factors may lead to more surface waters from the reservoir watershed reaching reservoirs, influencing mercury concentration in fish. |
FR |
: Flushing Rate. Flushing rate is the number of times all water in a reservoir is theoretically exchanged during a year. |
Dam |
: Impoundment class (1 = no functional dam present; all natural flowage. 0 = at some manmade flowage in the drainage area) |
RT |
: Reservoir Type. Three types of reservoirs are identified (1 = oligotrophic. 2 = eutrophic. 3 = mesotrophic) |
RS |
: Reservoir Stratification. Two indicators are used (1 = reservoir is stratified. 0 = reservoir is not stratified). A reservoir is considered as ‘stratified’ if a temperature decrease of ≥1 degree per meter exists with depth. |
Dataset
Dataset springfield.data is required to complete this assignment. It can be downloaded from the Assessments > Group Assignment (30%) section on Learnline.
Tasks
To complete this assignment, solve all problems below in your group. You should carefully consider the information given in the preceding case study and exclusively use the supplied dataset for analyses.
Problem A (10%): Data understanding and sampling
-
Describe the population, the sample and the levels of measurement in the given dataset.
-
Discuss the sampling technique used by Springfield officials and its implication(s) on the quality of collected data.
Problem B (20%): Descriptive statistics
-
Compute descriptive statistics for all eligible variables in the dataset. For quantitative variables, you must at least include the following statistical measures: mean, standard deviation, kurtosis, skewness, range, and five-number summary. Use appropriate statistics for categorical variables.
-
Using the computed statistics and appropriate charts, comment on the value distribution in each quantitative variable.
Problem C (30 %): Inferential statistics
Note: in solving questions 5 – 7, you must provide a justification for the chosen statistical method. By referring to Data Science Roadmap process model, show the step-by-step process of your statistical analyses in the submitted Excel workbook.
-
The national environmental agency determines that fish samples with more than 1.0 ppm are to be considered “Unsafe” because they exceed the safety limit for human consumption.
Springfield’s local environmental agency authority, however, considers samples with more than 0.4 ppm are at sufficient level of risk that they warrant further actions (e.g. issuing health advisory, banning fishing activities at selected reservoir, etc.). Based on the given dataset, what are the risk levels of reservoirs in Springfield? Should the local authority take any action?
-
There are concerns among industrialists who are benefiting from dams and dam constructions that there will be claims that high mercury levels in fish are related to the presence of dams in the reservoir’s drainage. Determine if the data support or refute this claim.
-
A colleague of yours wonders if the flushing rate of a reservoir could have anything to do with its sampled mercury level. Please answer her curiosity.
Problem D (10 %): Outlier analysis
-
By using outlier analysis, find if there is any reservoir with outlying mercury level.
-
With the outlier(s) removed, repeat the statistical analyses performed for Problem C Question 6. Report whether it results in a different public policy making. Finally, discuss common approaches in dealing with outliers.
Problem E (30 %): Data visualisation and storytelling
-
You are called to brief the members of Springfield legislature with results of your statistical analyses above. Your audience are not only concerned about the impact of mercury on community health and local tourism industries, but also need to know if certain public policies need to be immediately actioned. Note that most of your audience have no background in statistics.
Complete this task by:
-
Producing a single Power BI dashboard containing relevant data visualisations. This dashboard must be uploaded to Power BI service and shared with your instructor and the teaching support team.
Record a 5-minute PowerPoint presentation, utilising selected data visualisations found in your Power BI dashboard.
In your report, explain in detail how you applied at least one principle of effective data visualisation and storytelling when planning the Power BI dashboard and in delivering the presentation.
>springfield
.0 3
6 1 2 0 4 5
0
1
0 3 1 5 1 5
0 3 0 5 9
1
1.1 0 2 1 5 1
3
65 0
2
0 1 1 5 1 60 11 1 2 0 5 15 0 11 0.6 1.7 0 2 0 4 54 57 1
0.5 0 3 1 2 8
6
0
8
0.56 0.5 0 1 1 5 2 13 NA 1 2 0 5 4 13 0.56 1 3 0 5 4
2
1
0.5 1 1 1 4 0
4 14 1 2 0 5 2
2 14 1 2 0 5 0.6 14 15 1 3 0 5 9 1 87 15 2 1 2 0 2 4
6 0.5 1 1 1 3 0.57 22 0.53 0.3 0 1 1 3 2 53 16 0 3 0 3 21 16 0 3 0 5 23 8 0.51 0 2 0 4 3
0 30 8 1 2 0 6 1.8 4
19 0.62 0.9 NA 1 1 5 4
39 80 2 1 1 1 5 9
4 0.56 0.3 1 1 1 5 6 2
88 NA NA 1 1 1 5 122 10 25 9 0.51 1 2 0 5 0.57 0
2 45 9 NA NA 1 2 0 5 41 0.61 1 1 1 5 0.41 21 0.5 0.5 NA 1 1 5 6
1 91 96 6
0 1 1 4 0.43 15 17 0.61 2.9 1 2 0 5 0.9 10 18 0.59 0 3 0 5 0.28 0 18 0.51 0.9 1 3 0 3 0 14 18 1.8 1 2 0 5 5 1
19 1.1 1 2 0 5 0.18 5
0 24 19 0.46 0.8 1 2 1 5 5 NA NA NA NA NA NA 4 14 102 6 0.57 1 2 0 5 6 60 0.58 0.7 1 3 1 5 0.26 62 0.51 0 1 1 5 13 62 0.61 0 3 1 3 17 21 63 0.58 2.3 0 3 1 4 5 68 0.9 1 1 1 3 6 74 0.61 0.8 0 2 1 5 0.16 6
4 75 0.61 0.5 0 1 1 5 2 128 8 0.61 1 3 0 5 0.71 14 19 0 2 0 5 0.36 2 109 19 0.58 2.2 1 2 0 3 0.8 2 20 0.58 1.1 1 2 0 5 NA 44 20 NA NA 0 2 0 3 0.1 4 17 21 0.56 1 2 1 5 0.37 3 13 21 0.5 1 3 1 5 0.49 11 21 0.58 0 3 1 5 22 0.57 2 1 2 0 4 50 1 35 22 1 2 1 5 0.52 9 203 22 0.58 0 3 0 4 0.79 24 0.61 0 2 0 5 76 19 24 0 3 0 5 7 25 0.56 1.1 1 2 0 4 5 25 0.51 0.6 1 2 0 5 23 25 0.56 1 3 0 2 1 47 26 0.8 1 2 0 5 311 66 26 0.56 1 3 0 3 2 83 27 0.6 1 3 1 5 0.41 8
NA 27 0.58 0.7 1 3 0 4 NA 27 0.47 0.2 0 2 NA 5 17 27 0.46 2.3 1 2 0 5 0.58 227 12 28 0.51 0.6 0 2 0 5 1 28 0.61 1 0 3 0 4 18 29 0.51 0 2 1 3 0.66 1 24 30 NA NA 1 3 1 5 0.37 0 44 30 0.46 1 1 3 1 4 0.51 3 30 0.47 1 2 1 5 0.18 1 36 30 0.61 0 2 1 5 0.43 47 31 0.58 1 3 1 5 109 3 182 31 0.51 0 2 0 5 0.49 17 32 0.58 1 3 1 5 0.44 2 32 0.7 1 3 1 1 5 33 0.51 3.3 1 3 1 2 6 34 0.53 0.8 1 3 0 1 0.57 34 34 0.61 1 2 1 5 0.4 390 3 271 36 0.58 0.9 1 3 1 5 3 36 0.62 0.7 0 3 1 2 0.13 4 12 37 1 1 1 3 1 115 37 0.61 1 3 1 2 0.36 32 474 37 0.56 0 1 1 5 3 38 0.53 0 2 1 3 0.24 1 61 38 0.61 1.1 1 1 1 5 45 5
38 0.62 0.6 0 2 0 5 1.12 35 38 0.52 2.5 0 2 0 5 0.67 204 9 38 0.58 2.3 0 3 0 2 0.1 331 1 32 39 0 3 0 5 0.41 1 47 39 0.54 1.1 1 3 1 5 0.55 3 40 0.55 1.9 1 3 1 3 14 41 0.51 3.7 0 2 1 4 0.18 118 28 41 0.51 0 2 1 5 298 8 42 0.48 2.1 1 2 1 5 0.43 1 29 42 0.56 3.3 1 3 1 5 0.22 2 205 43 0.56 0.8 1 3 1 5 0.77 14 44 0.58 2.7 0 2 1 4 558 44 0.46 1 3 0 2 17 38 45 0.51 1 1 1 5 0.77 4 63 45 0.62 4.5 1 2 1 5 0.45 9 45 0.06 0.1 1 2 1 5 164 46 0.5 3.3 1 3 0 5 0.22 25 46 0.66 2.1 0 3 0 3 0.71 102 499 47 0.58 5.8 0 2 0 5 0.29 48 0.56 0 2 0 5 5 49 0.51 0.5 1 2 1 2 1 45 5 0.51 1 2 0 4 0.62 2 164 50 0.56 0.7 0 2 1 5 0.41 1 30 50 0.61 1.9 1 3 1 4 0.77 205 3 52 0.53 0.8 1 1 1 2 676 9 52 0.06 0.3 0 2 1 5 0.56 446 1 41 53 NA NA 1 3 1 5 0.57 15 568 54 0.56 1.1 0 2 1 3 0.43 4 55 0.58 0.5 1 1 1 3 0.41 87 106 57 0.6 0 3 1 3 0.36 6 58 0.56 1.8 1 1 1 5 0.37 4 70 58 0.56 1 3 1 5 0.48 92 6 0.58 1 2 0
Overview
This is a group assignment. It assesses your achievement of the following learning outcomes: LO1: Present and describe information effectively LO3: Draw conclusions about populations using sample information LO4: Suggest ways to improve decision making processes LO5: Obtain reliable forecasts of variables of interest In this assignment, you are required to produce several statistical analyses for a given case study. The analyses must be carried out using Microsoft Excel and Microsoft Power BI. For more details, refer to the questions and tasks on the next page.
Submission items
You are required to submit the following: a. One piece of written group report. Length: not more than 12 pages (A4-size, 11pt font size, single space). The report should address all questions in this assignment. b. One Excel workbook (.xlsx file), consisting of 7 worksheets. Each worksheet demonstrates the working of questions 3 – 9. c. One-page Power BI dashboard uploaded to Power BI service and shared with your Unit Coordinator and the teaching support team. This addresses question 10. d. One recording of PowerPoint presentation, saved as either .mp4 or .wmv format. Items a, b and d must be compressed together into a single .zip file. Rename the .zip file with your group number (e.g. Group1.zip).
2
Reservoir
Fish
Mercury
Elevation
Drainage Area
Surface Area
Max Depth
RF
FR
Dam
RT
RS
Abilene
3
0
5
1
9
6
7
4
36
0.
44
1.1
Abraham Lake
0.3
3
45
15
3
60
8
0.5
1.7
Academy
0.
54
4
87
17
10
0.
57
1.5
Acadia
0.2
1
65
13
55
43
100
0.51
Acadia Valley
0.9
1
27
70
106
0.6
4.3
Acheson
0.36
11
50
0.56
2.9
Acme
0.
19
46
Adair Creek
0.
21
4
38
1
14
0.59
Adams Creek
0.
28
29
22
14
34
12
Bangs Lake
0.
23
270
20
NA
Bank Bay
0.
62
717
272
2.7
Bank Creek
0.4
5
74
76
5
58
136
3.2
Bankfoot Creek
0.05
1
39
41
0.
66
19.4
Bankhead
0.
68
31
1
53
0.57
1.8
Bankview
362
239
0.06
1.2
Bannerman
0.1
2
96
0.
47
Barnegat
0.43
63
227
150
0.53
Barnes Ridge
4
16
1
568
158
Barnett Lake
0.071
212
0.62
6.8
Barnwell
0.14
1045
9
80
0.
61
5.4
Fork Lake
0.29
203
4
30
20.1
Fort Creek
0.16
1
37
0.61
2.3
Fox Creek
115
201
42
Gadois Lake
0.
25
1
24
713
0.46
Gap Creek
0.11
1
18
3
88
85
Gardiner Creek
0.22
519
122
Geikie Lake
0.18
58.8
Geraldine Lakes
91
Gerard Creek
0.41
913
225
90
7.3
Gertrude Lake
319
1574
92
Giants Mirror
0.
26
1
48
0.7
0.8
Halach
882
630
Halcreek
6
83
93
18.9
Halfway Creek
101
49
Halifax Coulee
0.13
16
33
0.71
Hamilton Hill
0.25
824
40
0.
52
Hamptons
109
Ice Water Creek
0.58
247
102
Junction Creek
0.82
204
50.9
Jutland Brook
0.45
419
605
Kakina Lake
922
298
2923
5.8
Kakut Creek
0.34
929
390
2.2
Kamisak Lake
0.37
392
Katchemut Creek
0.09
1209
280
0.76
Kaufmann Creek
0.77
311
564
Keane Creek
75
331
Keith Lake
0.27
397
7.9
Owl Lake
221
282
0.49
6.2
Owlseye Lake
377
Oxbow Lake
454
202
Oxley Creek
0.23
177
Pair Lakes
1690
64.1
Pans Lake
637
25.2
Paradise Basin
67
1
35
5.9
Paradise Creek
0.79
417
123
6944
Parker Lake
2.5
0.63
3.9
Parlby Lake
271
10.1
Parsons Creek
1235
182
2627
13.1
Partridge Lake
0.86
234
0.66
12.6
Pasque Creek
0.31
446
576
Pastecho Lake
0.91
500
685
Plante Creek
0.47
778
638
4.2
Plover Lake
0.025
1494
0.69
Poboktan Creek
1.12
1539
4.7
Poison Creek
1.08
425
2.8
Pony Creek
32
315
Popular Point Lake
0.94
263
1823
Stimson Creek
0.38
141
712
Stirling Lake
2035
Stones Canyon Creek
0.73
524
126
Stony Woman Creek
0.75
205
200
9.6
Stormy Creek
350
Stouffers Lake
146
Stove Lake
269
134
2.1
Strawberry Lake
499
3.3
Stronach Lake
819
403
9.5
Sturgeon Lake
0.96
8.2
Swoda Creek
639
197
10.5
Sylvester Creek
474
161
0.55
Tail Creek
0.67
1097
164
Talbot Lake
0.24
653
588
Tamarack Lake
116
250
4
3.7
Tate Creek
Tattum Lake
0.48
232
339
Tawayik Lake
995
0.54
28.6
Taylor Lake
0.35
195
1.3
Telegraph Creek
751
7.4
Temple Creek
1.22
276
210
1.4
Tepee Creek
683
Tetley Creek
0.32
170
676
The Snowbowl
354
1050
Thinahtea Creek
367
Third Lake
0.64
1.9
Thompson Lake
328
Thoreau Creek
304
118
Two Dam Creek
1.25
306
290
Two O’Clock Creek
1201
1.6
Tyler Lake
1.05
281
Underwood Lake
312
Updike Lake
1203
Upper Bertha Falls
557
704
Upper Mann Lake
0.21
1157
3053
20.3
Upper Thérien Lake
0.81
449
2
4.5
Utikuma Lake
374
Valhalla Lake
357
1120
Valley Creek
0.39
578
2515
Vokes Lake
1271
970
Volcano Creek
7865
Wabash Creek
1245
600
7850
9.2
Waddell Creek
0.12
180
660
Wadlin Lake
0.08
904
17.5
Wagon Creek
510
Wallaby Lake
1409
Wildman Creek
190
Wilson Creek
0.89
307
Winnifred Lake
Wood Buffalo Lake
402
Yak Lakes
341
467
Yellowstone Creek
532
14.2
Zama Lake
1503
211
Zephyr Creek
1700
3.8
Zig Zag Lake
846
422
45.2