clustering especially K-Means Using Python programming and Anaconda
HW: Clustering
MCIS-6273: Data Mining
This assignment is to give you a basic understanding of clustering especially K-Means Using Python
programming and Anaconda.
Please download the dataset Mall_Customers.csv from blackboard. It will be used for solving this
assignment.
Using K-Means
Part1: [10 points]
First, read the data set into your code. Save two data features to X. (Please pick the fourth feature
(Annual Income (k$)) and the fifth feature (Spending Score(1-100)), in this case we can visualize
the clusters.)
Please do the following:
1. Use the elbow method to find the optimal number of clusters
2. Fit K-Means to the dataset by using the optimal number of clusters found by the
elbow method
3. Predict the clustering results y for data set X
4. Visualizing the clusters results, please use different color for different clusters.
1. title, x label, y label should be specified.
2. The legend should be included.
Part2: [10 points]
Repeat the steps in Part1 but now pick the second feature (Gender) and the third feature (Age) in
your work to visualize the clusters. [This part may be trickier.]
Guidelines:
• This assignment is to be solved in groups of two students, not more.
• You only need to deliver a PDF report that is nicely formatted with: [5 points]
◦ Title page: Title and Group Names
◦ ToC page:
◦ Pages should be numbered and numbers show in the ToC
◦ A snapshot of each of the figures as described below, please see the Notes.
▪ Each snapshot has to have a caption, 10 words, describing the picture.
◦ Only one report per group should be submitted
◦ No need to submit any code
Notes:
• For reading and handling the data and guide your work, you will be given the code
example_3D.py and data 3D_network.csv.
◦ You should run the code and understand what it does first.
◦ Also, you will be given a code file named: practice_blobs.py. You can run the code in
Anaconda and see how the output and the different steps should be performed so you
know what to do.
◦ The codes run with no issues so any issues running the code is your responsibility to
resolve
• To know more about the Elbow Method mentioned above for choosing the right number of
clusters, please check: https://www.geeksforgeeks.org/elbow-method-for-optimal-value-
of-k-in-kmeans/
• The report you will submit should have the figures below.
◦ To give you an idea, running the practice_blobs.py gives the following output: [arrows
for output order]
predicted group: 2
distance from center 0 is: 3.731771999479638
distance from center 1 is: 6.290334770382815
distance from center 2 is: 3.382224740457218
distance from center 3 is: 7.132308122920062
https://www.geeksforgeeks.org/elbow-method-for-optimal-value-of-k-in-kmeans/
https://www.geeksforgeeks.org/elbow-method-for-optimal-value-of-k-in-kmeans/