Introduction to BIG DATA Midterm Exam Solution
QUESTION 1
What are the three characteristics of Big Data, and what are the main considerations in processing Big Data?
QUESTION 2
Explain the differences between BI and Data Science.
QUESTION 3
Briefly describe each of the four classifications of Big Data structure types. (i.e. Structured to Unstructured)
QUESTION 4
List and briefly describe each of the phases in the Data Analytics Lifecycle.
QUESTION 5
In which phase would the team expect to invest most of the project time? Why? Where would the team expect to spend the least time?
QUESTION 6
Which R command would create a scatterplot for the dataframe “df”, assuming df contains values for x and y?
QUESTION 7
What is a rug plot used for in a density plot?
QUESTION 8
What is a type I error? What is a type II error? Is one always more serious than the other? Why?
QUESTION 9
Why do we consider K-means clustering as a unsupervised machine learning algorithm?
QUESTION 10
Detail the four steps in the K-means clustering algorithm.
QUESTION 11
List three popular use cases of the Association Rules mining algorithms.
QUESTION 12
Define Support and Confidence
QUESTION 13
How do you use a “hold-out” dataset to evaluate the effectiveness of the rules generated?
QUESTION 14
List two use cases of linear regression models.
QUESTION 15
Compare and contrast linear and logistic regression methods.