Data mining Assignment

  

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

1. What’s an attribute? What’s a data instance?

  1. What’s      noise? How can noise be reduced in a dataset?
  2. Define      outlier. Describe 2 different approaches to detect outliers in a dataset.
  3. Describe      3 different techniques to deal with missing values in a dataset. Explain      when each of these techniques would be most appropriate.
  4. Given      a sample dataset with missing values, apply an appropriate technique to      deal with them.
  5. Give 2      examples in which aggregation is useful.
  6. Given      a sample dataset, apply aggregation of data values.
  7. What’s      sampling?
  8. What’s      simple random sampling? Is it possible to sample data instances using a      distribution different from the uniform distribution? If so, give an      example of a probability distribution of the data instances that is      different from uniform (i.e., equal probability).
  9. What’s      stratified sampling?
  10. What’s      “the curse of dimensionality”?
  11. Provide      a brief description of what Principal Components Analysis (PCA) does.      [Hint: See Appendix A and your lecture notes.] State what’s the input and      what the output of PCA is.
  12. What’s      the difference between dimensionality reduction and feature selection?
  13. Describe      in detail 2 different techniques for feature selection.
  14. Given      a sample dataset (represented by a set of attributes, a correlation      matrix, a co-variance matrix, …), apply feature selection techniques to      select the best attributes to keep (or equivalently, the best attributes      to remove).
  15. What’s      the difference between feature selection and feature extraction?
  16. Give      two examples of data in which feature extraction would be useful.
  17. Given a      sample dataset, apply feature extraction.
  18. What’s      data discretization and when is it needed?
  19. What’s      the difference between supervised and unsupervised discretization?

  

  1. Given      a sample dataset, apply unsupervised (e.g., equal width, equal frequency)      discretization, or supervised discretization (e.g., using entropy).
  2. Describe      2 approaches to handle nominal attributes with too many values.
  3. Given      a dataset, apply variable transformation: Either a simple given function,      normalization, or standardization.
  4. Definition      of Correlation and Covariance, and how to use them in data pre-processing      (see pp. 76-78).

ITS-632 Intro to Data Mining

 

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Dr. Patrick Haney

Dept. of Information Technology &

School of Computer and Information Sciences

University of the Cumberlands 

 

Chapter 2 Assignment

1. What’s an attribute? What’s a data instance?

2. What’s noise? How can noise be reduced in a dataset?

3. Define outlier. Describe 2 different approaches to detect outliers in a dataset.

4. Describe 3 different techniques to deal with missing values in a dataset. Explain when each of these techniques would be most appropriate.

5. Given a sample dataset with missing values, apply an appropriate technique to deal with them.

6. Give 2 examples in which aggregation is useful.

7. Given a sample dataset, apply aggregation of data values.

8. What’s sampling?

9. What’s simple random sampling? Is it possible to sample data instances using a distribution different from the uniform distribution? If so, give an example of a probability distribution of the data instances that is different from uniform (i.e., equal probability).

10. What’s stratified sampling?

11. What’s “the curse of dimensionality”?

12. Provide a brief description of what Principal Components Analysis (PCA) does. [Hint: See Appendix A and your lecture notes.] State what’s the input and what the output of PCA is.

13. What’s the difference between dimensionality reduction and feature selection?

14. Describe in detail 2 different techniques for feature selection.

15. Given a sample dataset (represented by a set of attributes, a correlation matrix, a co-variance matrix, …), apply feature selection techniques to select the best attributes to keep (or equivalently, the best attributes to remove).

16. What’s the difference between feature selection and feature extraction?

17. Give two examples of data in which feature extraction would be useful.

18. Given a sample dataset, apply feature extraction.

19. What’s data discretization and when is it needed?

20. What’s the difference between supervised and unsupervised discretization?

21. Given a sample dataset, apply unsupervised (e.g., equal width, equal frequency) discretization, or supervised discretization (e.g., using entropy).

22. Describe 2 approaches to handle nominal attributes with too many values.

23. Given a dataset, apply variable transformation: Either a simple given function, normalization, or standardization.

24. Definition of Correlation and Covariance, and how to use them in data pre-processing (see pp. 76-78).

Calculate your order
Pages (275 words)
Standard price: $0.00
Client Reviews
4.9
Sitejabber
4.6
Trustpilot
4.8
Our Guarantees
100% Confidentiality
Information about customers is confidential and never disclosed to third parties.
Original Writing
We complete all papers from scratch. You can get a plagiarism report.
Timely Delivery
No missed deadlines – 97% of assignments are completed in time.
Money Back
If you're confident that a writer didn't follow your order details, ask for a refund.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
$0.00
Power up Your Academic Success with the
Team of Professionals. We’ve Got Your Back.
Power up Your Study Success with Experts We’ve Got Your Back.

Order your essay today and save 30% with the discount code ESSAYHELP