I had an AI & ML module for my degree program and I went through such a hard time understanding what is happening in K-mean clustering. This video helped me to understand precisely what is happening in K-mean. Thank You so much.
This is so helpful. Just at the end, I wish you would have explained to how make sense of the results. What does it mean that obs 1 and 2 are in cluster 1? Thanks
Absolutely fantastic:) Really appreciate your efforts explaining the concept so simple and clear. Can i get the excel taught in the video for my reference.
sorry new to this. When you're manually deciding whether 1 or 2 cluster for the 6 what if you have 600 lines? What formula would you use to decide which cluster? what if you have 10 clusters?
=match(min(k3:l3),k3:l3,0) This formula will output the the column position of the minimum value in the range and therefore will output the what cluster the minimum value belongs to.
Pretty much clear. Appreciate! I have a question for you. Is there any easy way to have hierarchical clustering in excel since I do not know how many clusters I will have. hence I would like to find the best number of clusters according to my dendrogram. Regards!
For those of us with vision problems, which will become more and more of a thing due to cell phone usage and blue light emitted by screens which penetrates all the way the back of the eye causing vision problems, could you from now on zoom in on those formulas please and the charts.
maam i m using for 7 groups and 92 values what should be the k.. when i take k=4 and how to check the distance as after applying the abs formuka and randomly choosing 4 centers , i am getting a few values exactlt equal
It will be very similar. Except that you deal with Fuzzy numbers as opposed to real numbers. you just need to use Fuzzy math to find the distances as well as the CG for each cluster.
Great explanation. I understood how each interaction changes the center point of each cluster. Could you explain how to apply this interaction solution using the excel solver tool? Thank you
Thanks for answering. I read in John W. Foreman's book "Data Smart: Using Data Science to Transform Information into Insight" that it is possible to use the solver tool in Excel, not to form clusters, but to optimize the positions of the centroids of each cluster by performing multiple iterations. This would save a lot of work for large volumes of data.
therefore are many variations out there. to do that, you need to first learn the specific method and use solver in combination with the method explained here. There are many videos on youtube on how to solve a linear programming problem using the solver. Hope this helps.
Hi Dr. Shokoufeh, this is such a great tutorial and i could get clear step by step explanation on how the k means clustering works. However, i stumbled on a problem while implementing this method with my data. As we can freely choose the first centroids randomly, i found myself that when i try to use different first centroids, say on first trial i used 1st and 2nd data as the first random centroid, then on second trial i used 2nd and 4th data for the first random centroid, it results differently in my final iteration. I would like to ask what might likely the problem in my case. Thank you very much.
Hi Radityo, The problem is that you need to calculate variation of the clusters, in the final iteration (step missing from the video). So this is: 1. Randomly select centers 2. Calculate distances and to converge (average points) 3. Repeat, until no points change clustering 4. (not in this video). Calculate variation (how evenly splitted are the points in the clusters) 5. (not in video). Repeat 1-4 steps. 6. The best clustering is given by the best variation. Trick is the more you repeat 1-4 the more you will come to the best clustering.
I wish you had included a visual at the end to show how the rows of 4 values are plotted as I cannot get my head around that, your other video using just X, Y coordinates makes a lot of sense to me and the plot at the end is useful, but here I cannot see how the data would be visualised and how something with a value of 1.4, 0.2 can always be in the same cluster as something with a value of 5.1, 3.5
@@sxmirzaei Actually that makes complete sense and I now understand what you are showing, I had not thought of it like that as I didnt really understand the data but thinking in terms of age, weight, height, body fat as an example, I can see how it can be clustered
This explains k-means better than any other resource that I've seen online even after 3 years. Thanks for sharing your knowledge.
Bros Pls can u explain to me ? I have watched this severally there is a place i still dont understand
This is the clear and concise explanation I was looking for. Thank you.
understood clearly. I am learning data science. I was googling for months for a simple explanation of this kind. finally got.Thanks...
Perfect!. I've been looking for this explanation by googling for more than 3 hours! Thanks!
Thank you, this is the explanation I was searching for about k means clustering.
I had an AI & ML module for my degree program and I went through such a hard time understanding what is happening in K-mean clustering. This video helped me to understand precisely what is happening in K-mean. Thank You so much.
I collect all clear explanations. This video goes into my collection))! Good job!
This was a most excellent tutorial!
This is so helpful. Just at the end, I wish you would have explained to how make sense of the results. What does it mean that obs 1 and 2 are in cluster 1? Thanks
Hi
Shokoufeh!, I have just understood the method with your video. Thank you very much!!!
..en español (mexicano): Que Fregona Explicación - Felicidades!
Nice explanation. Very helpful
Wow! Absolutely brilliant!
thank u finally a clear and simple video
Excelente video! Muchas gracias, me sirvió de mucho para la tarea! Gracias de nuevo!!
Thanl you so much, I was reallyy confused on this topic
A very good explanation.... this saved me!
It was great and perfect. Thanks a lot.
Well explained, Thank you for simplifying it.
Thank you! Very well explained!
thank you so much, it is very helpful!!
Thank you ,well explained......
Thank you , please keep going !
Very great explanation in both this and the other video!
Absolutely fantastic:) Really appreciate your efforts explaining the concept so simple and clear.
Can i get the excel taught in the video for my reference.
sorry new to this. When you're manually deciding whether 1 or 2 cluster for the 6 what if you have 600 lines? What formula would you use to decide which cluster? what if you have 10 clusters?
=match(min(k3:l3),k3:l3,0) This formula will output the the column position of the minimum value in the range and therefore will output the what cluster the minimum value belongs to.
This is insanely good
settle down Kansal
@@AStCG1989 sorry ?
very nice demonstration better than the professor It remains to automate this with a macro! What do you think?
Pretty much clear. Appreciate! I have a question for you. Is there any easy way to have hierarchical clustering in excel since I do not know how many clusters I will have. hence I would like to find the best number of clusters according to my dendrogram. Regards!
Hi Navid, It might be too late for you. But I just made a video for HC. Please find it in my data analytics playlist.
Saludos desde México 🇲🇽
For those of us with vision problems, which will become more and more of a thing due to cell phone usage and blue light emitted by screens which penetrates all the way the back of the eye causing vision problems, could you from now on zoom in on those formulas please and the charts.
This is very helpful, thank you!
maam i m using for 7 groups and 92 values what should be the k..
when i take k=4 and how to check the distance as after applying the abs formuka and randomly choosing 4 centers , i am getting a few values exactlt equal
This was very useful. Thank you. which type of chart should we use to draw the clusters.. scatter plot? for four variables. Iam new to all this
Congratulations!
thank you very muchhh!!
آفرین خوب توضیح دادی
I like this video. Dr. Shokoufeh, If I may ask if you have the same demonstration for Fuzzy C-Means (FCM)...many thanks.
It will be very similar. Except that you deal with Fuzzy numbers as opposed to real numbers. you just need to use Fuzzy math to find the distances as well as the CG for each cluster.
I have 3 center I tried only 2 duration but not the same result one can I do now trying 3 duration or not ?!
This is great. Thanks
Great explanation. I understood how each interaction changes the center point of each cluster. Could you explain how to apply this interaction solution using the excel solver tool? Thank you
I am afraid I do not understand the question. Excel Solver solves optimization not clustering.
Thanks for answering. I read in John W. Foreman's book "Data Smart: Using Data Science to Transform Information into Insight" that it is possible to use the solver tool in Excel, not to form clusters, but to optimize the positions of the centroids of each cluster by performing multiple iterations. This would save a lot of work for large volumes of data.
therefore are many variations out there. to do that, you need to first learn the specific method and use solver in combination with the method explained here. There are many videos on youtube on how to solve a linear programming problem using the solver. Hope this helps.
Thank you
If you add 7th data, do you have to repeat the process again?
Very helpful
Thanks, this was helpful!
Thanks. But how to predict cluster for new observation?
It was good, thanks.
¿Someone explain to me how she decided when use cluster 1 or cluster 2 please?
It’s based on the distance of each point from the center of the cluster. You choose the min distance and see what cluster it’s associated with.
she was looking at the lowest value and putting the value from the column (1 or 2), didn't understand that part at first too.
Hi Dr. Shokoufeh, this is such a great tutorial and i could get clear step by step explanation on how the k means clustering works. However, i stumbled on a problem while implementing this method with my data. As we can freely choose the first centroids randomly, i found myself that when i try to use different first centroids, say on first trial i used 1st and 2nd data as the first random centroid, then on second trial i used 2nd and 4th data for the first random centroid, it results differently in my final iteration. I would like to ask what might likely the problem in my case. Thank you very much.
Hi Radityo,
The problem is that you need to calculate variation of the clusters, in the final iteration (step missing from the video). So this is:
1. Randomly select centers
2. Calculate distances and to converge (average points)
3. Repeat, until no points change clustering
4. (not in this video). Calculate variation (how evenly splitted are the points in the clusters)
5. (not in video). Repeat 1-4 steps.
6. The best clustering is given by the best variation. Trick is the more you repeat 1-4 the more you will come to the best clustering.
Dear Shokoufeh, what if abs formula result for c1 and c2 is two equal numbers? What is the minimum distance then?
In this case, it does not matter which one you choose. Assign the point to C1 or C2 randomly!
Great. thanks
thank you so much !
thanks a lot..
Thank you !
what if the minimum abs vlaue is same for one or more clusters? which value can be taken?
you just pick one randomly.
Thanks,
Thanks
Hello, I need the algorithm in java code. Can someone pass it to me?
Why we should perform this work manually?!!
how is single linkage used in kmeans clustering? isn't that a part of hierarchical clustering?
at the end of K-mean, if you want to know how far apart your clusters are, you use single linkage or other methods discussed in the video
wala ko kasabot animal
Too long and complex of an explanation for a simple concept.
I wish you had included a visual at the end to show how the rows of 4 values are plotted as I cannot get my head around that, your other video using just X, Y coordinates makes a lot of sense to me and the plot at the end is useful, but here I cannot see how the data would be visualised and how something with a value of 1.4, 0.2 can always be in the same cluster as something with a value of 5.1, 3.5
You cannot show a 4 dimension problem in a 3-D space!
@@sxmirzaei Actually that makes complete sense and I now understand what you are showing, I had not thought of it like that as I didnt really understand the data but thinking in terms of age, weight, height, body fat as an example, I can see how it can be clustered