Assignment – Zero plagiarism
I already have the common answers for the assignment.
I need to Zero plagiarism in the assignment
just I need change in the Question 3 and 4
method of answer with same answer .. change some words ..
Pg. 06 |
خطأ! استخدم علامة التبويب “الصفحة الرئيسية” لتطبيق Heading 1 على النص الذي ترغب في أن يظهر هنا. |
Assignment 1
Deadline: Thursday 24/09/2020 @ 23:59
[Total Mark for this Assignment is 6]
Data Mining and Data Warehousing
IT446
College of Computing and Informatics
Question One
1.5 Marks
Learning Outcome(s):
Apply and evaluate data mining algorithms with respect to problems they are specifically designed for
Using equi-depth partition, create 3 bins to smooth the given data input by:
· Boundaries
· Means
Data: 23 10 5 14 20 16 11 6 20 1 14 27 2 25 1
Answer:
Step one is sorting the data
1 1 2 5 6 10 11 14 14 16 20 20 23 25 27
Step two is creation of bins
B1: 1 1 2 5 6
B2: 10 11 14 14 16
B3: 20 20 23 25 27
Smoothing by bin boundaries:
B1: 1 1 1 6 6
B2: 10 10 16 16 16
B3: 20 20 20 27 27
Smoothing by bin means:
B1: 3 3 3 3 3
B2: 13 13 13 13 13
B3: 23 23 23 23 23
References: chapter 3 of the book, page 30
Lecture 3, slide 57
Question Two
1.5 Marks
Learning Outcome(s):
Apply and evaluate data mining algorithms with respect to problems they are specifically designed for
1. Given the following dataset, fill in the missing values with the attribute mean for all samples belonging to the same class
Answer:
Class A: 10+12=22, 22\2=11 (mean)
Class B: 20+22+24=66, 66\3=22 (mean)
Attribute 1
Attribute 2
Attribute 3
class
Object 1
10
8
– 4
A
Object 2
12
– 8
5
A
Object 3
11
9
5
A
Object 4
20
10
10
B
Object 5
22
12
11
B
Object 6
24
15
18
B
Object 7
22
14
19
B
2. Using Attribute 1, Attribute 2 and Attribute3, calculate the Manhattan distance between Object 1 and Object 2.
Answer:
3. Using Attribute 1, Attribute 2 and Attribute3, calculate the Euclidian distance between Object 4 and Object 5.
Answer:
References
Chapter 2 of the book, page 73.
Lecture 2, slide, 37
Question Three
1 Marks
Learning Outcome(s):
Explain the basic principles of programming, concept of language. Universal constructs of programming languages.(LO1)
What are the significance of OLAP (online analytical processing) in Data Mining?
Answer:
1. Data warehouse systems, on the other hand, serve users or knowledge workers in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of different users. These systems are known as online analytical processing (OLAP) systems.
2. An OLAP query often needs read-only access of data records for summarization and aggregation.
3. No need for concurrency control and recovery mechanisms.
4. “concept hierarchies is useful in OLAP?”
In the multidimensional model, data are organized into multiple dimensions, and each dimension contains multiple levels of abstraction defined by concept hierarchies. This organization provides users with the flexibility to view data from different perspectives. A number of OLAP data cube operations exist to materialize these different views, allowing interactive querying and analysis of the data at hand. Hence, OLAP provides a user-friendly environment for interactive data analysis.
5. Ability to perform different operations, such as pivot, roll-up, slice and dice, drill-down. This enables to extract more useful knowledge due to exploring the data from different points of view.
References
Chapter 4 of the book, page 128, 129, 130, 146, 148
Question Four
2 Marks
Learning Outcome(s):
Explain the basic principles of programming, concept of language. Universal constructs of programming languages.(LO1)
Write about the following Terms:
Similarity, Dissimilarity, Data matrix and Dissimilarity matrix (give example for Data matrix and Dissimilarity matrix)
Answer:
A
similarity
is a measure. It measures two objects, i and j, and return the value 0 if the objects are unalike. The higher the similarity value, the greater the similarity between objects. (Typically, a value of 1 indicates complete similarity, that is, the objects are identical.) In other words:
· Numerical measure of how alike two data objects are
· Value is higher when objects are more alike
· Often falls in the range [0,1]
A
Dissimilarity
is a measure (Distance). It measures two objects, i and j, and return the value 1 if the objects are the same. The higher the similarity value, the lower the Dissimilarity between objects. In other words:
· Numerical measure of how different two data objects are
· Lower when objects are more alike
· Minimum dissimilarity is often 0
· Upper limit varies
Data matrix:
it is data structures used to store the data objects. It is also known as object-by-attribute structure. This structure stores the n data objects in the form of a relational table. Example:
Dissimilarity matrix:
it is data structures used to store dissimilarity values for pairs of objects. It is also known as object-by-object structure. This structure stores a collection of values that are available for all pairs of n objects. It registers only the distance. Example with Euclidian distance:
References:
Chapter 2 of book, page 67+68
Lecture 2, slide 30
|
|
…
|
|
|
|
)
,
(
2
2
1
1
p
p
j
x
i
x
j
x
i
x
j
x
i
x
j
i
d
–
+
+
–
+
–
=
)
|
|
…
|
|
|
(|
)
,
(
2
2
2
2
2
1
1
p
p
j
x
i
x
j
x
i
x
j
x
i
x
j
i
d
–
+
+
–
+
–
=
pointattribute1attribute2
x112
x235
x320
x445
Sheet1
point x y
0 2
p2 2 0
p3 3 1
p4 5 1
point attribute1 attribute2
x1 1 2
x2 3 5
x3 2 0
x4 4 5
p1
Sheet2
Sheet3
x1x2x3x4
x10
x23.610
x35.15.10
x44.2415.390
Sheet1
point x y
0 2
p2 2 0
p3 3 1
p4 5 1
point x y
p1 0 2
p2 2 0
p3 3 1
p4 5 1
x1 x2 x3 x4
x1 0
x2 3.61 0
x3 5.1 5.1 0
x4 4.24 1 5.39 0
p1
Sheet2
Sheet3