data analysis and viz

  • Use the tool of your choice (RStudio, Excel, Python) to generate a word document with basic data analysis of the data set posted in the Week 2 content folder.
  • Create a Word document that includes the screen shots described below.

Questions/Requests:

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper
  • Create a summary of statistics for the dataset. (provide a screen shot)
  • Create a correlation of statistics for the dataset. (provide a screen shot) 
  • What is the Min, Max, Median, and Mean of the Price? (provide a screen shot)
  • What is the correlation values between Price, Ram, and Ads? (provide a screen shot)
  • Create a subset of the dataset with only Price, CD, and Premium. (provide a screen shot)
  • Create a subset of the dataset with only Price, HD, and Ram where Price is greater than or equal to $1750. (provide a screen shot)
  • What percentage of Premium computers were sold? (provide a screen shot)(Hint: Categorical analysis)
  • How many Premium computers with CDs were sold? (provide a screen shot)(Hint: Contingency table analysis)
  • How many Premium computers with CDs priced over $2000 were sold? (provide a screen shot)(Hint: Conditional table analysis)

Your document should be an easy-to-read font in MS Word. Your cover page should contain the following: Title, Student’s name, University’s name, Course name, Course number, Professor’s name, and Date.

Analyzing and Visualizing Data

Chapter 4
Working With Data

Data Assets and Tabulation Types

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

• Two main categories
o Data that exist in tables; Datasets
o Data that exist as isolated values

• Data Types
o Levels of data or scales of measurement
o Type of exploratory data analysis you can undertake
o Editorial thinking you establish
o Specific chart types you might use
o Color choices and layout decisions around composition

Data Assets and Tabulation Types cont.

• Textual (Qualitative)
o Unstructured streams of words
o Descriptive details of a weather forecast for a given city
o The full title of an academic research project
o The description of a product on Amazon

Data Assets and Tabulation Types cont.

• Nominal (Qualitative)
o Ordinal data is still categorical and qualitative in nature
o Characteristics of order
o The response to a survey question: based on a scale of 1 (unhappy)

to 5 (very happy)
o The general weather forecast: expressed as Very Hot, Hot, Mild, Cold,

Freezing

Data Assets and Tabulation Types cont.

• Interval (Quantitative)
o Interval data is the less common form of quantitative data
o Quantitative and numeric measurement
o Measure for temperature

Data Assets and Tabulation Types cont.

• Ratio (Quantitative)
o Most common quantitative variable
o Age of a survey participant in years
o Forecasted amount of rainfall in millimetres
o Unlike interval data, for ratio data variables zero means something

Data Assets and Tabulation Types cont.

• Temporal Data
o Time-based data
o Textual: ‘Four o’clock in the afternoon on Monday, 12 March 2016’

Ordinal: ‘PM’, ‘Afternoon’, ‘March’, ‘Q1’
o Interval: ‘12’, ‘12/03/2016’, ‘2016’
o Ratio: ‘16:00’

Data Assets and Tabulation Types cont.

• Discrete
o No ‘in-between’ state
o Days of the week
o Heads or tails for a coin toss
o 1,2,3,4,5,6,etc.

• Continuous
o Has in-between state
o Height and weight
o Temperature
o Time
o 1.1,1.2,1.3,1.4,1.5,etc.

Data Acquisition

• What data do you need and why?
• From where, how, and by whom will the data be acquired?
• When can you obtain it?

Data Acquisition cont.

• Curated by You
o Primary data collection
o Manual collection and data foraging
o Extracted from pdf files
o Web scraping (also known as web harvesting)

Data Acquisition cont.

• Curated by Others
o Issued to you
o Download from the Web
o System report or export
o Third-party services
o API

Data Examination

• Data Properties
o Data types
o Size
o Condition

▪ Missing values
▪ Erroneous values
▪ Inconsistencies
▪ Duplicate records
▪ Out of date
▪ Uncommon system characters or line breaks
▪ Leading or trailing spaces

Data Examination cont.

• How to Approach This?
o Inspect and scan
o Data operations
o Statistical methods
o Frequency counts
o Frequency distribution
o Measurements of central tendency
o Measurements of spread
o Maximum, minimum and range
o Percentiles
o Standard deviation

Influence on Process

• Moving forward
o Purpose map ‘tone’
o Editorial angles
o Physical properties influence scale

Data Transformation

• Potential Activities
o Transform to clean
o Transform to convert
o Transform to create
o Transform to consolidate

Data Exploration

• Exploratory Data Analysis
o Instinct of the analyst
o Reasoning

▪ Deductive
▪ Inductive

o Chart types
o Research
o Statistical methods
o Nothings
o Not always needed

How to Use the R Programming

Language for Statistical Analyses
Part I: An Introduction to R

What Is R?

◼ a programming “environment”

◼ object-oriented

◼ similar to S-Plus

◼ freeware

◼ provides calculations on matrices

◼ excellent graphics capabilities

◼ supported by a large user network

What is R Not?

◼ a statistics software package

◼ menu-driven

◼ quick to learn

◼ a program with a complex graphical interface

Installing R

◼ www.r-project.org/

◼ download from CRAN

◼ select a download site

◼ download the base package at a minimum

◼ download contributed packages as needed

http://www.r-project.org/

Tutorials

◼ From R website under “Documentation”

– “Manual” is the listing of official R documentation

• An Introduction to R

• R Language Definition

• Writing R Extensions

• R Data Import/Export

• R Installation and Administration

• The R Reference Index

Tutorials cont.

– “Contributed” documentation are tutorials and

manuals created by R users

• Simple R

• R for Beginners

• Practical Regression and ANOVA Using R

– R FAQ

– Mailing Lists (listserv)

• r-help

Tutorials cont.

◼ Textbooks

– Venables & Ripley (2002) Modern Applied

Statistics with S. New York: Springer-

Verlag.

– Chambers (1998). Programming With Data: A

guide to the S language. New York: Springer-

Verlag.

R Basics

◼ objects

◼ naming convention

◼ assignment

◼ functions

◼ workspace

◼ history

Objects

◼ names

◼ types of objects: vector, factor, array, matrix,

data.frame, ts, list

◼ attributes

– mode: numeric, character, complex, logical

– length: number of elements in object

◼ creation

– assign a value

– create a blank object

Naming Convention

◼ must start with a letter (A-Z or a-z)

◼ can contain letters, digits (0-9), and/or

periods “.”

◼ case-sensitive

– mydata different from MyData

◼ do not use use underscore “_”

Assignment

◼ “<-” used to indicate assignment

– x<-c(1,2,3,4,5,6,7)

– x<-c(1:7)

– x<-1:4

◼ note: as of version 1.4 “=“ is also a valid assignment operator

Functions

◼ actions can be performed on objects using

functions (note: a function is itself an object)

◼ have arguments and options, often there are

defaults

◼ provide a result

◼ parentheses () are used to specify that a

function is being called

Let’s look at R

R Workspace & History

Workspace

◼ during an R session, all objects are stored in

a temporary, working memory

◼ list objects

– ls()

◼ remove objects

– rm()

◼ objects that you want to access later must be

saved in a “workspace”

– from the menu bar: File->save workspace

– from the command line:
save(x,file=“MyData.Rdata”)

History

◼ command line history

◼ can be saved, loaded, or displayed

– savehistory(file=“MyData.Rhistory)

– loadhistory(file=“MyData.Rhistory)

– history(max.show=Inf)

◼ during a session you can use the arrow keys

to review the command history

Two most common object types

for statistics:

matrix

data frame

Matrix

◼ a matrix is a vector with an additional attribute

(dim) that defines the number of columns and

rows

◼ only one mode (numeric, character, complex,

or logical) allowed

◼ can be created using matrix()

x<-matrix(data=0,nr=2,nc=2)

or

x<-matrix(0,2,2)

Data Frame

◼ several modes allowed within a single data

frame

◼ can be created using data.frame()
L<-LETTERS[1:4] #A B C D

x<-1:4 #1 2 3 4

data.frame(x,L) #create data frame

◼ attach() and detach()
– the database is attached to the R search path so that the database is

searched by R when it is evaluating a variable.

– objects in the database can be accessed by simply giving their names

Data Elements

◼ select only one element

– x[2]

◼ select range of elements

– x[1:3]

◼ select all but one element

– x[-3]

◼ slicing: including only part of the object

– x[c(1,2,5)]

◼ select elements based on logical operator

– x(x>3)

Data Import & Entry

Importing Data

◼ read.table()

– reads in data from an external file

◼ data.entry()

– create object first, then enter data

◼ c()

– concatenate

◼ scan()

– prompted data entry

◼ R has ODBC for connecting to other programs

Data entry & editing

◼ start editor and save changes

– data.entry(x)

◼ start editor, changes not saved

– de(x)

◼ start text editor

– edit(x)

Calculate your order
Pages (275 words)
Standard price: $0.00
Client Reviews
4.9
Sitejabber
4.6
Trustpilot
4.8
Our Guarantees
100% Confidentiality
Information about customers is confidential and never disclosed to third parties.
Original Writing
We complete all papers from scratch. You can get a plagiarism report.
Timely Delivery
No missed deadlines – 97% of assignments are completed in time.
Money Back
If you're confident that a writer didn't follow your order details, ask for a refund.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
$0.00
Power up Your Academic Success with the
Team of Professionals. We’ve Got Your Back.
Power up Your Study Success with Experts We’ve Got Your Back.

Order your essay today and save 30% with the discount code ESSAYHELP