Descriptive Statistics Data Analysis Plan
Scenario: The sample data was gathered from 30 households chosen randomly from US Department of Labor’s 2016 Consumer Expenditure Surveys. The information provided in the survey were both socioeconomic and expenditure variables that were provided by each household.
(You need a plan describing what you will like to learn from the dataset available to you, and how you’ll do it)
Table 1. Variables Selected for the Analysis
Variable Name
in the Data Set
Description
(See the data dictionary for describing the variables.)
Type of Variable
(Qualitative or
Quantitative
)
Variable 1:
“Income”
Annual household income in USD.
Quantitative
Variable 2:
“Expenditures”
Annual household expenditures USD
Quantitative
Variable 3:
“Housing”
Annual household housing costs USD
Quantitative
Variable 4:
“Electricity”
Annual household electricity costs USD
Quantitative
Variable 5:
“Water”
Annual household water costs USD
Quantitative
Reason(s) for Selecting the Variables and Expected Outcome(s):
1. Variable 1: “Income” – To show how much income comes in household
2. Variable 2: “Expenditures“ – To show expenditures paid out
3. Variable 3: “ Housing“ – To show housing expenses paid out annually
4. Variable 4: “Electricity“ – To show electricity expenses paid out annually
5. Variable 5: “Water“ – To show water expenses paid out annually
Data Set Description:
Proposed Data Analysis:
Measures of Central Tendency and Dispersion
Complete
Table 2. Numerical Summaries of the Selected Variables
and briefly explain why you choose those measurements. Note: The information for the required variable, “Income,” has already been completed and can be used as a guide for completing information on the remaining variables.
Table 2. Numerical Summaries of the Selected Variables
Variable Name
Measures of Central Tendency and Dispersion
Rationale for Why Appropriate
Variable 1:
“Income”
· ● Number of Observations
· ● Median
· ● Sample Standard Deviation
I am using median for two reasons:
1. 1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency.
2. 2. The variable is quantitative.
I am using sample standard deviation for three reasons:
1. 1. The data is a sample from a larger data set.
2. 2. It is the most commonly used measure of dispersion.
3. 3. The variable is quantitative.
Variable 2:
· ● Number of Observations
· ● Median
· ● Sample Standard Deviation
I am using median for two reasons:
1. 1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency.
2. 2. The variable is quantitative.
I am using sample standard deviation for three reasons:
1. 1. The data is a sample from a larger data set.
2. 2. It is the most commonly used measure of dispersion.
3. 3. The variable is quantitative.
Variable 3:
· ● Number of Observations
· ● Median
· ● Sample Standard Deviation
I am using median for two reasons:
1. 1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency.
2. 2. The variable is quantitative.
I am using sample standard deviation for three reasons:
1. 1. The data is a sample from a larger data set.
2. 2. It is the most commonly used measure of dispersion.
3. 3. The variable is quantitative.
Variable 4:
· ● Number of Observations
· ● Median
· ● Sample Standard Deviation
I am using median for two reasons:
1. 1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency.
2. 2. The variable is quantitative.
I am using sample standard deviation for three reasons:
1. 1. The data is a sample from a larger data set.
2. 2. It is the most commonly used measure of dispersion.
3. 3. The variable is quantitative.
Variable 5:
· ● Number of Observations
· ● Median
· ● Sample Standard Deviation
I am using median for two reasons:
1. 1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency.
2. 2. The variable is quantitative.
I am using sample standard deviation for three reasons:
1. 1. The data is a sample from a larger data set.
2. 2. It is the most commonly used measure of dispersion.
3. 3. The variable is quantitative.
Graphs and/or Tables
Complete Table 3. Type of Graphs and/or Table for Selected Variables and briefly explain why you choose those graphs and/or tables. Note: The information for the required variable, “Income,” has already been completed and can be used as a guide for completing information on the remaining variables.
Table 3. Type of Graphs and/or Tables for Selected Variables
Variable Name
Graph and/or Table
Rationale for why Appropriate?
Variable 1:
“Income”
Graph: I will use the histogram to show the normal distribution of data.
Histogram is one of the best plot to show the normal distribution of quantitative level data .
Variable 2:
“Expenditures”
Graph: I will use the histogram to show the normal distribution of data.
Histogram is one of the best plot to show the normal distribution of quantitative level data .
Variable 3:
“Housing”
Graph: I will use the histogram to show the normal distribution of data.
Graph: I will use the histogram to show the normal distribution of data.
Variable 4:
“Electricity”
Graph: I will use the histogram to show the normal distribution of data.
Histogram is one of the best plot to show the normal distribution of quantitative level data .
Variable 5:
“Water”
Graph: I will use the histogram to show the normal distribution of data.
Histogram is one of the best plot to show the normal distribution of quantitative level data .