Data Mining Questions

  

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

QUESTION 1

Suppose that you are employed as a data mining consultant for an Internet search engine company. Describe how data mining can help the company by giving specific examples of how techniques, such as clustering, classification, association rule mining, and anomaly detection can be applied.

QUESTION 2

Identify at least two advantages and two disadvantages of using color to visually represent information.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

QUESTION 3

Consider a group of documents that has been selected from a much larger set of diverse documents so that the selected documents are as dissimilar from one another as possible. If we consider documents that are not highly related (connected, similar) to one another as being anomalous, then all of the documents that we have selected might be classified as anomalies. Is it possible for a data set to consist only of anomalous objects or is this an abuse of the terminology?

Question 4

Consider a group of documents that has been selected from a much larger set of diverse documents so that the selected documents are as dissimilar from one another as possible. If we consider documents that are not highly related (connected, similar) to one another as being anomalous, then all of the documents that we have selected might be classified as anomalies. Is it possible for a data set to consist only of anomalous objects or is this an abuse of the terminology?

(a) Is there a difference between the two sets of points? Please explain. 

(b) If so, which set of points will typically have a smaller SSE for K=10 clusters? 

(c) What will be the behavior of DBSCAN on the uniform data set? 

Question 5

Give an example of a data set consisting of three natural clusters, for which (almost always) K-means would likely find the correct clusters, but bisecting K-means would not.

Question 3. Anomalies

A data set majorly consists of objects that are related; these objects are known as normal objects. In the same data set, objects that do not conform to other objects known as anomalous objects also exist. A data set, therefore, consists of both the normal and anomalous objects. The anomalous objects attract much attention since they give unique information that should be given attention (Hossain, Akhtar, Ahmad, & Rahman, 2019).

Cluster validity measures

Question 4 (a)

Defining normal regions is challenging since the boundaries between normal regions and the abnormal regions are always slim; therefore, they cannot be precisely distinguished.

Question 4 (b)

Normal datasets have a smaller SSE for K-10 cluster since it represents data with some relations; therefore, the distance to nearest clusters is short giving least square errors when compared to abnormal data sets (Hossain, Akhtar, Ahmad, & Rahman, 2019).

Question 4 (c)

DBSCAN will merge uniform data into a cluster and classify the ununiformed data into noise. DBSCAN will also solve the boundary issues by identifying the variation in density and using the identified variations to cluster the data (Hossain, Akhtar, Ahmad, & Rahman, 2019).

Calculate your order
Pages (275 words)
Standard price: $0.00
Client Reviews
4.9
Sitejabber
4.6
Trustpilot
4.8
Our Guarantees
100% Confidentiality
Information about customers is confidential and never disclosed to third parties.
Original Writing
We complete all papers from scratch. You can get a plagiarism report.
Timely Delivery
No missed deadlines – 97% of assignments are completed in time.
Money Back
If you're confident that a writer didn't follow your order details, ask for a refund.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
$0.00
Power up Your Academic Success with the
Team of Professionals. We’ve Got Your Back.
Power up Your Study Success with Experts We’ve Got Your Back.

Order your essay today and save 30% with the discount code ESSAYHELP