IT Questions
Assignment 3 QTS
1)
Present an example where data mining is crucial to the success of a business. What data mining functionalities does this business need (e.g., think of the kinds of patterns that could be mined)?
2)
Suppose that the data for analysis includes the attribute grade. The grade values for the data tuples are (in increasing order) 10, 12, 13, 13, 16, 17, 17, 18, 19, 19, 22, 22, 22, 22, 27, 30, 30, 32, 32, 32, 32, 32, 37, 42, 43, 49, 67.
a) Find the interquartile range.
b) Draw a boxplot of the data.
3)
Consider the following relational database for the Central Zoo. Central Zoo wants to maintain information about its animals, the enclosures in which they live, and its zookeepers and the services they perform for the animals. In addition, Central Zoo has a program by which people can be sponsor of animals. Central Zoo wants to track its sponsors, their dependents, and associated data. Each animal has a unique animal number and each enclosure has a unique enclosure number. An animal can live in only one enclosure. An enclosure can have several animals in it or it can be currently empty. A zookeeper has a unique employee number. Every animal has been cared for by at least one and generally many zookeepers; each zookeeper has cared for at least one and generally many animals. Each time a zookeeper performs a specific, significant service for an animal the service type, date, and time are recorded. A zookeeper may perform a particular service on a particular animal more than once on a given day. A sponsor, who has a unique sponsor number and a unique social security number, sponsors at least one and possibly several animals. An animal may have several sponsors or none. For each animal that a particular sponsor sponsors, the zoo wants to track the annual sponsorship contribution and renewal date. In addition, Central Zoo wants to keep track of each sponsor’s dependents. A sponsor may have several dependents or none. A dependent is associated with exactly one sponsor.
a) Describe three OLAP uses of this data warehouse.
b) Design a multidimensional database using a star schema for a data warehouse for the Central Zoo business environment.
4)
Consider the market basket transactions on the following table:
CID
TID
Item Bought
10
1
{T, S, R}
20
2
{Q, P, T}
30
3
{T, R, O}
40
4
{Q, P, O}
50
5
{S, O, R}
50
6
{T, R, Q, P}
40
7
{Q, P, R}
30
8
{S, R}
20
9
{T, R, Q, P}
10
10
{S, O}
a) List 3 different association rules from the above table, by treating each Customer (CID) as a market basket. Each item should be treated as a binary variable (1 if an item appears in at least one transaction bought by the customer, and 0 otherwise.) Show the support and confidence for each rule.
b) Show the Frequent Pattern tree (FP tree) that would be made for the data set. Let min_sup = 30%.
5)
Consider the following table which describes the sales data for electronics company, according to the dimensions time, item, and location.
Pid
Quarter
Locid
Sales
300
2
3
30
300
3
3
13
300
4
3
20
400
2
3
35
400
3
3
25
400
4
3
55
500
2
3
13
500
3
3
15
500
4
3
15
300
2
4
40
300
3
4
27
300
4
4
15
400
2
4
31
400
3
4
50
400
4
4
25
500
2
4
25
500
3
4
45
500
4
4
10
a) Find the result of roll-up (drill-up) operation on location.
b) Find the result of drill-down operation on time from quarters to months.
c) Find the result of slice operation for time =”Q3”.
6)
The following contingency table summarizes the relationship between people who drink tea and coffee. Where coffee refers to people drink coffee, ¬coffee refers to people not drink coffee, tea refers to people drink tea, and ¬tea refers to people not drink tea.
coffee
¬coffee
tea
20
5
¬tea
10
15
a) Suppose that the association rule “coffee ⇒ tea” is mined. Given a minimum support threshold of 25% and a minimum confidence threshold of 50%, is this association rule strong?
b) Based on the given data, can we conclude that coffee drinkers and tea drinkers are independent? If not, what kind of correlation relationship exists between the two?
7- A database has ten transactions. Let min_sup = 30%.
TID
Items Bought
100
{A, B ,D, E}
200
{B, C, D}
300
{A ,B, D, E}
400
{A, C, D, E}
500
{B, C, D, E}
600
{B, D, E}
700
{C, D}
800
{A, B, C}
900
{A, D, E}
1000
{B, D}
(a) Apply the Apriori algorithm to the above data set.
(b) Show the FP tree that would be made for the data set.
8- A survey of college students determined the preference for cell phone providers. The following data were obtained.
Provider
Gender
T-Mobile
AT&T
Verizon
Other
Male
12
39
27
16
Female
8
22
24
12
Can we conclude that gender and cell phone provider are independent? (Hint: Assume the significance level = 0.05).