Introduction to Cluster Analysis with R - an Example

Поділитися
Вставка
  • Опубліковано 1 лют 2025

КОМЕНТАРІ • 1,2 тис.

  • @DineshKumarT1990
    @DineshKumarT1990 8 років тому +27

    Great tutorial!!...the way you explain is easy to understand...you should do more like this

    • @bkrai
      @bkrai  8 років тому

      Thanks for the feedback!

    • @josebueno7602
      @josebueno7602 5 років тому +2

      Please, how can I get the data utilities.csv? Thanks.

  • @rarosification
    @rarosification 7 років тому +2

    My goodness, this video is so complete, and clearly explained with details of the script... Thank you so very much... 100 points to you...!! You have a new fan...

    • @bkrai
      @bkrai  7 років тому

      Thanks :)

  • @NotTheSharpestKnife-mh
    @NotTheSharpestKnife-mh 6 років тому +6

    This is an excellent tutorial -- well presented and thorough. I followed along with my own application example (country healthcare per capita expenditure versus infant mortality rates of various types) and got very interesting results.

    • @bkrai
      @bkrai  6 років тому

      Thanks for comments and feedback!

  • @sebastiansocianu5441
    @sebastiansocianu5441 4 роки тому +2

    5-star explanation. thank you! Very much recommended for beginners and intermediate R users. You got a new follower!

    • @bkrai
      @bkrai  4 роки тому

      Awesome, thank you!

  • @ArcenisRojas
    @ArcenisRojas 8 років тому +4

    Great tutorial. I really like how you stuck to explaining the steps through a practical application. Thank you for this.

    • @bkrai
      @bkrai  4 роки тому

      Thanks for comments!

  • @ramasamythirunavukkarasu6777
    @ramasamythirunavukkarasu6777 3 роки тому +1

    Thank you so much Dr.B.Rai, I inspired your way of teaching even you in online, hopefully, every one enjoying your teaching

    • @bkrai
      @bkrai  3 роки тому

      You are welcome!

  • @karoargote
    @karoargote 4 роки тому +3

    Really thank you so much!!! The best tutorial on this topic!!!

    • @bkrai
      @bkrai  4 роки тому

      You're very welcome!

  • @kanikalungani
    @kanikalungani 6 років тому +1

    If i had a thousand likes you would have received them all sir. Love the way you have explained and covered the concepts

    • @bkrai
      @bkrai  6 років тому

      Thanks, I’ll consider it 1000😊

  • @stephenhobbs948
    @stephenhobbs948 8 років тому

    Excellent explanation and code. I took the Johns Hopkins data science course, and clustering was part of the course. This video really helps explain the concept.

    • @bkrai
      @bkrai  8 років тому

      +Stephen Hobbs thanks 👍

  • @rupeshbharadwaj
    @rupeshbharadwaj 6 років тому +2

    Great tutorial! You are really helping a lot of people like me, and the best part is- drama, background music etc are completely missing unlike many other tutorials. Also saw some bhojpuri songs :)...thank you sir!

    • @bkrai
      @bkrai  6 років тому

      Thanks for comments and feedback!

  • @markshanks9142
    @markshanks9142 5 років тому +1

    This is truly an excellent, clear and concise tutorial. You covered a lot of topics in a short amount of time. I will be watching your other videos. Well done!

    • @bkrai
      @bkrai  5 років тому

      Thanks for your comments and feedback!

  • @jonathanrhein7553
    @jonathanrhein7553 8 років тому

    Hi Bharatendra, great video - really helpful!
    Everything goes well until the point of doing the scree plot, I am getting:
    > withinGroupSumOfSquares = (nrow(normNum)-1) * sum(apply(normNum, 2, var, na.rm=TRUE))
    > for(i in 2:20) withinGroupSumOfSquares[i] = sum(kmeans(normNum, centers=i)$withinss)
    Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
    > plot(1:20, withinGroupSumOfSquares, type="b", xlab = "Number of Clusters", ylab = "Within group SS")
    Error in xy.coords(x, y, xlabel, ylabel, log) :
    'x' and 'y' lengths differ
    Can you help me? Thank you.

    • @jonathanrhein7553
      @jonathanrhein7553 8 років тому

      someone has deleted my comment...

    • @bkrai
      @bkrai  8 років тому

      +Jonathan Rhein Not sure what's causing the error you got. May have something to do with data. I ran my data using the code you have, and everything seems fine.

    • @bkrai
      @bkrai  8 років тому

      +Jonathan Rhein I still see your previous comment.

  • @janelutken9818
    @janelutken9818 4 роки тому +1

    Thank you so much. This was easy to follow and I did my own analysis as we went along with almost no trouble. This was a breakthrough video for me.

    • @bkrai
      @bkrai  4 роки тому

      You are welcome! For more detailed presentation, you may refer to:
      ua-cam.com/video/otjWCaMcVaA/v-deo.html

  • @archeops.
    @archeops. 5 років тому +1

    Fantastic explanation! I followed along with a different dataset and it worked perfectly! Great work!!

    • @bkrai
      @bkrai  5 років тому

      Thanks for comments!

  • @emiltsenov7853
    @emiltsenov7853 8 років тому +1

    Hi Bharatendra, this is an excellent tutorial - the first one that worked for me. Great effort, keep up the good work!

    • @bkrai
      @bkrai  8 років тому

      +Emil Tsenov Good to know, thanks for feedback!

  • @tradingtraveller05
    @tradingtraveller05 8 років тому +1

    Thanks for such wonderful explanation.
    By the way, I was working on a similar dataset, and apply didnt work for me. Although I removed all character vectors, but still the numeric vectors were returning 'NA'. I applied sapply and it solved the purpose.
    Thanks again!!

    • @bkrai
      @bkrai  8 років тому

      Good to hear!

  • @sarahroffe2142
    @sarahroffe2142 6 років тому +1

    This is a brilliant tutorial which is easy to understand and follow.

    • @bkrai
      @bkrai  6 років тому

      Thanks for comments!

  • @arnab_jana
    @arnab_jana 8 років тому +1

    After a long time, I have seen such a good tutorial. Thanks, for your effort

    • @bkrai
      @bkrai  8 років тому

      +Arnab Jana Thanks for the feedback!

  • @kapilrana1153
    @kapilrana1153 3 роки тому +1

    Great Explanation!
    Thank you Sir For this Video Lecture
    I will be watching your other videos.

    • @bkrai
      @bkrai  3 роки тому

      Thanks and welcome!

  • @harikamacharla7005
    @harikamacharla7005 7 років тому +1

    Wah!!! how could u explain it so well!! Great job.

    • @bkrai
      @bkrai  4 роки тому

      Thanks!

  • @ssundaraju
    @ssundaraju 6 років тому +1

    Very Informative, great slides and explanations. The delivery and presentation was good. I will be viewing other videos produced by Edureka. Some suggestions, show more examples. Present the limitations and god fit scenarios for K-means clustering.

    • @bkrai
      @bkrai  6 років тому

      Thanks for comments and feedback!

  • @kishoreyarramshetty2930
    @kishoreyarramshetty2930 3 роки тому +1

    Good Job in explaining the content along with code..

    • @kishoreyarramshetty2930
      @kishoreyarramshetty2930 3 роки тому +1

      can u provide us the link to download the dataset in this video to run the code.

    • @bkrai
      @bkrai  3 роки тому

      Thanks for comments!

    • @bkrai
      @bkrai  3 роки тому

      For data, there should be a link below this:
      ua-cam.com/video/otjWCaMcVaA/v-deo.html

  • @zhuziyan9454
    @zhuziyan9454 6 років тому +2

    dear professor, I am so lucky to know you. could you also update full tutorial about using rmd and advanced model like hmm? Thank you and wish you have a great day

    • @bkrai
      @bkrai  6 років тому

      Thanks for the suggestion, I've added this to my list.

  • @rosestube1233
    @rosestube1233 8 років тому +1

    Thank you for this tutorial! it's amazingly easy to follow and thanks a lot for the script/file

    • @bkrai
      @bkrai  8 років тому

      +Roses Tube 👍

  • @ste6826
    @ste6826 3 роки тому +1

    One suggestion to improve the video - When you click buttons and such please can you do it slowly so people can see where you click. Also perhaps consider using a highlight icon for your mouse? I had to watch 4 times before I realised you had pressed the 'run' button in the middle.

    • @bkrai
      @bkrai  3 роки тому

      Thanks for the suggestion!

  • @DeepeshSinghAndroid
    @DeepeshSinghAndroid 8 років тому +1

    Hi Mr. Rai, great tutorial. Thanks for your effort. Just wanted to understand more about these 2 methodologies. Why and when we apply different methodologies i.e. K means and Hierarchy. It will be great help if you can make separate videos for the same. Also, as lots of people requested for data set and you have already uploaded to Dropbox, could you please share the link in your description for everyone's benefits. Thanks again :)

    • @bkrai
      @bkrai  8 років тому

      Initially we try all methods and finally choose the one that seems more meaningful for the dataset used. It's difficult to say which method will work best beforehand. Also thanks for your feedback and suggestions.

  • @TusharLapani
    @TusharLapani 8 років тому +1

    Thanks Bharatendra. Can you please upload video of how to performe clustering when the dataset has numbers of numerical attributes and categorical attributes. In this video you are eliminating categorical attribute. What would you have done if your dataset has 10 numeric columns and 8 categorical data.
    Appreciate your knowledge contribution.

    • @bkrai
      @bkrai  8 років тому +1

      +Tushar Lapani For cluster analysis you must have quantitative variables. You can use categorical variables after cluster analysis to see if they show any pattern with identified clusters and use it for characterizing the clusters.

  • @bassamal-kaaki3253
    @bassamal-kaaki3253 4 роки тому +1

    Lovely explanation:) easy to absorb.

    • @bkrai
      @bkrai  4 роки тому

      Thanks for comments!

  • @ahmetcandemir7032
    @ahmetcandemir7032 4 роки тому +1

    Very good tutorial ! impressively well explained. Thank you

    • @bkrai
      @bkrai  4 роки тому

      You are welcome!

  • @shezamalik7918
    @shezamalik7918 2 роки тому +1

    hello sir, great tutorial, you're a life saver for marketing analytics course!
    I have a question regarding Scree plot code:
    wss

    • @bkrai
      @bkrai  2 роки тому +1

      it tries 1 to 20 clusters.

    • @shezamalik7918
      @shezamalik7918 2 роки тому +1

      @@bkrai oh right, thanks alot! Can you also tell how do we deal with gender variable for clustering? What im doing is mutating a new var thats 1 and 0 instead of male and female. I then convert that to numeric variable. And then i do the usual process. Is this correct?

    • @bkrai
      @bkrai  2 роки тому

      For clustering, we should use only numeric variables.

    • @shezamalik7918
      @shezamalik7918 2 роки тому +1

      @@bkrai so how should i deal with gender? Its an important variable in marketing for ad targeting etc

    • @bkrai
      @bkrai  2 роки тому

      you can put that on Dendrogram after clustering to see if it shows any pattern.

  • @Nit1601
    @Nit1601 2 роки тому +1

    THE BEST !!! Could you please advise, do we need to do anything else to normalize if we are dealing with Binary columns (0,1). Thanks !

    • @bkrai
      @bkrai  2 роки тому +1

      We should exclude such variables.

  • @betzthomas9693
    @betzthomas9693 5 років тому +1

    Thank you Sir for the tutorial.Please explain if there is any package is R to identify on what basis clusters are grouped from the data we provide.

    • @bkrai
      @bkrai  5 років тому

      Refer to the averages for each cluster and all variables.

  • @saikrishna2589
    @saikrishna2589 7 років тому +1

    Thank you for wonderful explanation. Appreciate your help with these amazing videos

    • @bkrai
      @bkrai  7 років тому +1

      Thanks for your feedback!

  • @tanmay094
    @tanmay094 4 роки тому +1

    Nice and informative tutorial sir.
    I am performing hierarchical clustering on my dataset with 10 variables and 200 observations. But the output is not very interpretable.
    Please suggest how can I make it more interpretable.
    Thanks.

    • @bkrai
      @bkrai  4 роки тому

      You can explore other clustering methods and if they provide better insights. Here is the link:
      ua-cam.com/play/PL34t5iLfZddvMPAl1TzHJ_GjQcD3s6w_Z.html

    • @tanmay094
      @tanmay094 4 роки тому +1

      @@bkrai Thanks, sir.
      I have one more query. I want to do cluster analysis on PCA.
      Can you please suggest a good reference tutorial for doing that?

    • @bkrai
      @bkrai  3 роки тому

      This approach will work fine.

  • @saikatkar547
    @saikatkar547 4 роки тому +1

    thats really excellent explanation!

    • @bkrai
      @bkrai  4 роки тому +1

      Glad it was helpful!

  • @mariaamithapennington3737
    @mariaamithapennington3737 4 роки тому +1

    Thank you so much for the tutorial. It is extremely helpful. But my question like the other is that it would have been very kind of you if you would have linked your data set too. Thanks!

    • @bkrai
      @bkrai  4 роки тому +1

      You can get it from here: ua-cam.com/video/otjWCaMcVaA/v-deo.html

    • @mariaamithapennington3737
      @mariaamithapennington3737 4 роки тому +1

      @@bkrai Thank you very much! Appreciate it! :)

    • @bkrai
      @bkrai  4 роки тому +2

      You are welcome!

  • @alicelatimier3133
    @alicelatimier3133 4 роки тому +2

    Thank you so much for your amazing videos, everything is so clear and practical :) From a french research in cognitive science, I have one tricky question for you : i would like to find the best classifier/cluster analysis for repeated measures dataset (i.e., multiple repeated measures for one subject on the same features, as this is the case in experimental psychology research for example, or in longitudinal studies). Best

    • @bkrai
      @bkrai  4 роки тому

      You can look into this link:
      ua-cam.com/play/PL34t5iLfZddvMPAl1TzHJ_GjQcD3s6w_Z.html

  • @tahzeebfatima3121
    @tahzeebfatima3121 6 років тому +1

    Thanks for the informative video. May I please know how to deal with dichotomous variables along with continuous variables in the data if we want to include both in one cluster analysis, how do we do it please?

    • @bkrai
      @bkrai  4 роки тому

      This link has more cluster analysis topics:
      ua-cam.com/video/otjWCaMcVaA/v-deo.html

  • @abdulkhader101
    @abdulkhader101 6 років тому +1

    You are a great teacher sir, you are really awesome

    • @bkrai
      @bkrai  6 років тому

      Thanks for comments!

  • @gulapakarthik3864
    @gulapakarthik3864 4 роки тому +1

    This is really Amazing...Thank you so much 😎

    • @bkrai
      @bkrai  4 роки тому

      You are welcome!

  • @thejuhulikal6290
    @thejuhulikal6290 4 роки тому +2

    sir please make the video on this K-mode also, that would be great to understand both topics and comparison

    • @bkrai
      @bkrai  3 роки тому

      Thanks, I've added it to my list.

  • @anigov
    @anigov 6 років тому +1

    Dear Sir..thank you for the time & effort that you have put in to make this wonderful video tutorial.
    I have a query. At 12:27 , how are the original average values displayed even though member.c is used which is obtained through a series of calculations using the normalised data?
    Why did not you use PCA to decide the no. of clusters for kmeans?
    Regards
    Aniruddh

    • @bkrai
      @bkrai  6 років тому +1

      In the 2nd aggregation line, note that I've used utilities. That's the reason we can display original values. In the 1st aggregation, z was used. Also, here focus was on clustering, so pca is not used.

    • @anigov
      @anigov 6 років тому +1

      Thank you

  • @prashantmishra2094
    @prashantmishra2094 5 років тому +1

    nice tutorial Sir. Keep making such videos

    • @bkrai
      @bkrai  5 років тому

      Thanks for comments!

  • @hridayborah9750
    @hridayborah9750 5 років тому +1

    yes all your videos are helpful. Could you prepare a tutorial on machine learning in the tidy verse.

    • @bkrai
      @bkrai  5 років тому

      I've added it to list of future videos. Thanks!

  • @liamhannah6325
    @liamhannah6325 6 років тому +1

    This was really helpful THANK YOU! Make more! I would love it if you showed us how to do Latent Class Analysis in R, its not obvious right now

    • @bkrai
      @bkrai  6 років тому

      Thanks for comments and suggestion!

  • @fredpoole6373
    @fredpoole6373 6 років тому +1

    Great Video! Look forward to more videos!

    • @bkrai
      @bkrai  6 років тому

      Thanks for comments! For more machine learning videos you can use this link: goo.gl/WHHqWP

  • @mwambakapambwe2382
    @mwambakapambwe2382 5 років тому +1

    Fantastic presentation. Very helpful

    • @bkrai
      @bkrai  5 років тому

      Thanks for comments!

  • @zhuziyan9454
    @zhuziyan9454 7 років тому +1

    could you please explain why subtracting the first variable by [,-c(1,1)] rather than[,-1]? Thank you

    • @bkrai
      @bkrai  7 років тому

      Both work fine. You can use it if you need to remove more than one variable.

  • @desisto007
    @desisto007 7 років тому

    Thank you so much! Very well explained.
    I would like to ask you if I still can use the Euclidian distance to find the closest elements of a cluster center, even if I use a dimensionality reduction approach (such as PCA, T-sne) that uses probabilities to arrange clusters in 2 dimension before using K-means.

  • @nafinks6081
    @nafinks6081 7 років тому +1

    Excellent tutorial! very easy to grasp.

    • @bkrai
      @bkrai  7 років тому

      +Nafin Ks thanks for the feedback!

  • @rithishvikram1759
    @rithishvikram1759 5 років тому +1

    nice explaination sir!!!!! thank you so much ....great respect ....sir if you would pls attach concern datasets with a video ...thank you once again

  • @Aminah6623
    @Aminah6623 4 роки тому +1

    Wow. This was extremely helpful. Thank you.

    • @bkrai
      @bkrai  4 роки тому

      You're very welcome!

  • @shubhasmitasahani1738
    @shubhasmitasahani1738 5 років тому +1

    Hello Sir, do you have any video on latent class clustering in R? Please share...Looking forward.

    • @bkrai
      @bkrai  5 років тому

      Not yet, but I'm adding this to my list for future. For clustering related videos, you may refer to this link:
      ua-cam.com/play/PL34t5iLfZddvMPAl1TzHJ_GjQcD3s6w_Z.html

  • @santosacosta4645
    @santosacosta4645 6 років тому +1

    Thank you very much sir. Question: using Within group SS plot (min 14:39), isn't the optimal number of clusters 5? the variability from 4 to 5 seems very significant. Please let me know.

    • @bkrai
      @bkrai  6 років тому +2

      This data has only 22 companies. As we increase number of clusters, number of companies in some clusters becomes really small, to the extent that a cluster may contain just one company. So the choice of 'k' should also consider this aspect.

  • @ramp2011
    @ramp2011 7 років тому +1

    Great tutorial. Thank you... How do you handle categorical variables for clustering? In this example looks like you removed the 1st column that happened to be a factor variable. Can you please post the data file used in the comments as well if possible? Thank you

    • @bkrai
      @bkrai  7 років тому

      Cluster analysis only works with quantitative variables. During the analysis you may note that we calculate distances, which we cannot do with categorical variables. But after finalizing number of clusters, you can plot dendrogram with a categorical variable to see if there is any obvious pattern or not.
      For data, send email id.

    • @Jorge-vp7of
      @Jorge-vp7of 7 років тому

      you can use K-modes to do clustering with categorical data

    • @medardkafoutchoni6511
      @medardkafoutchoni6511 6 років тому

      Thank you dear Sanchez. What about mixed data (i.e. including both numerical and categorical variables)?

    • @vivekwilliam3370
      @vivekwilliam3370 6 років тому

      vivek4u.3048@gmail.com

  • @EduardoFrancoChalco
    @EduardoFrancoChalco 8 років тому +1

    Really great tutorial, thank you very much!

    • @bkrai
      @bkrai  8 років тому

      +Eduardo Franco Chalco 👍

    • @EduardoFrancoChalco
      @EduardoFrancoChalco 8 років тому

      Would you please send me the scrip and data? email: efranco1@uc.cl

  • @springANDstorm
    @springANDstorm 5 років тому +1

    Sir, how to interpret the between SS/total SS value? In your example, it's 36% . How should that be interpreted?

    • @bkrai
      @bkrai  5 років тому +1

      Between SS captures variability between clusters. When it increases, it indicates better clustering because within cluster variability will come down. Elements within a cluster should be closer to each other whereas elements between clusters should be further away for a good cluster formation.

    • @springANDstorm
      @springANDstorm 5 років тому +1

      @@bkrai thanks Sir.

  • @betzthomas9693
    @betzthomas9693 5 років тому +1

    Can you please explain in K means clustering(Scree plot).What is the idea behind wss calculation

    • @bkrai
      @bkrai  5 років тому

      wss is within sum of squares that captures within cluster variability. When wss is low, then cluster formation is good.

    • @betzthomas9693
      @betzthomas9693 5 років тому

      Thank you @@bkrai

  • @txigual
    @txigual 5 років тому +1

    Thank you so much, very useful video.

    • @bkrai
      @bkrai  5 років тому

      Thanks for comments!

  • @keeninterest8889
    @keeninterest8889 5 років тому +1

    Sir, Can you please tell me whether it is necessary to do normalization to qualitative data?

    • @bkrai
      @bkrai  5 років тому

      No you don’t need it for qualitative variables.

    • @keeninterest8889
      @keeninterest8889 5 років тому

      @@bkrai Thank you sir

  • @sanjayh3897
    @sanjayh3897 8 років тому +1

    Excellent tutorial Bharatendra ! Do you have any example to share for Overlapping clustering - would appreciate it.
    Thanks !

    • @bkrai
      @bkrai  8 років тому

      There are 52 datasets where clustering can be applied in the link below:
      archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&numAtt=&numIns=&type=&sort=nameUp&view=table

  • @kandreitapomen
    @kandreitapomen 8 років тому +1

    Great tutorial. Thank you very much!

    • @bkrai
      @bkrai  8 років тому

      +Kandreitapomen 👍

  • @omkarsingh6060
    @omkarsingh6060 5 років тому +1

    Amazing...Really impressed

    • @bkrai
      @bkrai  5 років тому

      Thanks for comments!

  • @metalhealth14
    @metalhealth14 8 років тому +1

    this is a really great detail thank you! I appreciate the detailed guidance into understanding and checking cluster membership

    • @bkrai
      @bkrai  8 років тому

      It's good to hear your feedback! Thanks

  • @Guavarosa
    @Guavarosa 5 років тому +1

    Please can you give me a hint? I want to give as input the initial centres for kmeans clustering. I just do not manage to select these points out of my dataset. Thank you in advance for your help!

    • @bkrai
      @bkrai  5 років тому

      Why do you need that? The algorithm should automatically take care of finding the best clusters.

    • @Guavarosa
      @Guavarosa 5 років тому

      @@bkrai Because I try to correlate my clusters to the physical problem. That is why I was wondering if I can give initial centres as in case of software Origin Pro. I appreciate your answer.

  • @rohanshetty1016
    @rohanshetty1016 5 років тому +1

    Sir your video lectures are really awesome! Excellent Tutorial!
    Can you please share the csv file used for cluster analysis?

    • @bkrai
      @bkrai  5 років тому

      send me your email id.

  • @dr.naeemhaider4747
    @dr.naeemhaider4747 7 років тому

    your video is very helpful for me to learn cluster analysis, i also want to know does k- means can be applied to time series data as well, like 50 companies electricity consumption data of 3 months and each company has 24 hours of discrete values of voltage and resistance with time stamps .... can we use k means with time series?

    • @bkrai
      @bkrai  7 років тому

      I would say try and see what you get, no harm in trying.

    • @dr.naeemhaider4747
      @dr.naeemhaider4747 7 років тому

      can you suggest a method for time series data.?

    • @dr.naeemhaider4747
      @dr.naeemhaider4747 7 років тому

      i tried it works fine but i want to use time and dates aswell any suggestions ?

  • @stephravelo
    @stephravelo 8 років тому +1

    This is a very informative video. I hope you would have a repository github of your data so that we can play around with the script you used.

    • @bkrai
      @bkrai  5 років тому

      Here is the link: github.com/bkrai/Top-10-Machine-Learning-Methods-With-R

  • @deepaksingh9318
    @deepaksingh9318 7 років тому +1

    A good tutorial ,
    Could you please also tell us when should we go for Kmeans and When should we go for Hclust(I.E situations to select methods)
    2. What do we mean when we say above average and below average (in Hclust) , i mean if the value is 1.05 so are we saying that sales in cluster x is higher 1.05 than average ??
    a explanation will be appreacited..
    REst everything is explained in a really simple way so Subscribing the channed :)
    Keep it up..

    • @bkrai
      @bkrai  4 роки тому

      For more on clustering:
      ua-cam.com/video/otjWCaMcVaA/v-deo.html

  • @thejuhulikal6290
    @thejuhulikal6290 4 роки тому +2

    Sir please do the vedio on PAM algorithms!

    • @bkrai
      @bkrai  3 роки тому

      Thanks, I've added it to my list.

  • @mallorywright1453
    @mallorywright1453 5 років тому +1

    Do you have any examples of validating a cluster analysis using LPA?

    • @bkrai
      @bkrai  5 років тому

      I'm adding to the list of future videos.

  • @tabasummirza8638
    @tabasummirza8638 4 роки тому +1

    great tutorial.please tell me how to label he clusters

    • @bkrai
      @bkrai  4 роки тому

      You can come up with appropriate names for the labels by looking at averages for each cluster and each variable.

  • @asifjeelani1215
    @asifjeelani1215 3 роки тому +1

    thank you sir, very well explained

    • @bkrai
      @bkrai  3 роки тому

      Thanks for comments!

  • @biswadeepdas5528
    @biswadeepdas5528 8 років тому +1

    sir, it is quite good. I would really appreciate if you upload more videos .

    • @bkrai
      @bkrai  8 років тому

      +biswadeep das thanks for your feedback! I'll definitely create more such videos.

  • @niv2419
    @niv2419 7 років тому +1

    Hello sir, as always your videos have been very helpful and thank you for this video too. Also, I wanted to know if there is a way to improve between cluster distance? If so can you please let us know?
    Thank You!

    • @bkrai
      @bkrai  7 років тому +1

      You can increase or decrease number of clusters and see which one improves between cluster distance.

  • @theshubhnaam
    @theshubhnaam 6 років тому +1

    Best tutorial ever thank you sir..got the concept bt sir can you please share the utilities dataset..🙌🙌

    • @bkrai
      @bkrai  6 років тому

      Thanks for comments! Send email id.

    • @theshubhnaam
      @theshubhnaam 6 років тому

      Bharatendra Rai imshubhamv.25@gmail.com

    • @theshubhnaam
      @theshubhnaam 6 років тому +1

      Thank you sir

    • @bkrai
      @bkrai  6 років тому

      all set.

    • @theshubhnaam
      @theshubhnaam 6 років тому +1

      Bharatendra Rai yes sir🙌🙌

  • @nadiatulfarhanamohtar3078
    @nadiatulfarhanamohtar3078 5 років тому +1

    Hi sorry I don't understand this. Why this dendogram have 22 variables ? Not results in 8 variables likes results in book of applied multivariate analysis? Can u explain please

    • @bkrai
      @bkrai  5 років тому

      On dendogram what you see are 22 companies for which clustering is carried out and they are not variables.

    • @nadiatulfarhanamohtar3078
      @nadiatulfarhanamohtar3078 5 років тому +1

      @@bkrai oh ya. In applied multivariate text book, the same dataset, the dendogram cluster doesn't have the company, only have 8 variables in dendogram using complete linkage cluster . How can your result dendogram have 22 companies. Not 8 variables

    • @nadiatulfarhanamohtar3078
      @nadiatulfarhanamohtar3078 5 років тому +1

      @@bkrai can I have your email?

    • @bkrai
      @bkrai  5 років тому

      I do not understand purpose in clustering variables. If the purpose is dimension reduction, PCA should have been done.

    • @bkrai
      @bkrai  5 років тому

      seemabharat@gmail.com

  • @im_karamo1907
    @im_karamo1907 6 років тому +1

    Thanks for the video... how can we get the video to practice on? Thanks again for the video

    • @bkrai
      @bkrai  6 років тому

      If you need data, send me email id.

    • @im_karamo1907
      @im_karamo1907 6 років тому +1

      @@bkrai my email ID # kamasbah@live.com

    • @bkrai
      @bkrai  6 років тому

      all set.

  • @harishnagpal21
    @harishnagpal21 6 років тому +1

    Nice video as always. I have couple of questions. In K means cluster example, if we want a list as per the three clusters, how do we tag that.
    2nd query, I have a data set of 100000 insurance customers having customer ids and their policy Face amount. I want to divide them in cluster ( say 5 cluster) and also want to know which customer comes in which cluster (same query as first) so that I can target them for a campaign. How do we do that and which clustering technique to use? Thanks in advance.

    • @bkrai
      @bkrai  6 років тому

      You can use something similar to kc$cluster that I've used at around 16:30 time point in the video.

    • @harishnagpal21
      @harishnagpal21 6 років тому

      Thanks

  • @sudhakarbabunynavarapu8133
    @sudhakarbabunynavarapu8133 7 років тому +1

    Could you please send the data files for the practice what datafiles used in the tutorial.

    • @bkrai
      @bkrai  7 років тому

      email id?

  • @thejuhulikal6290
    @thejuhulikal6290 4 роки тому +1

    Hello sir, please upload a video on Qualitative comparative analysis!! thanks again sir

    • @bkrai
      @bkrai  4 роки тому

      I've added it to my list, thanks!

  • @VenkateshDataScientist
    @VenkateshDataScientist 8 років тому

    HAPPY NEW YEAR TO YOU AND YOUR FAMILY MEMBERS .
    Sir ,If you have time please upload support vector machine and Sentimental analysis .

    • @bkrai
      @bkrai  8 років тому

      A very happy new year to you and family too! I'll keep your suggestion in mind for next videos.

    • @bkrai
      @bkrai  8 років тому

      Here is the link to SVM:
      ua-cam.com/video/pS5gXENd3a4/v-deo.html&list=PL34t5iLfZddtII4ssT8FSUFP27fPYDEhY&index=25

  • @chitralalawat8106
    @chitralalawat8106 5 років тому +1

    Does mclust also required normalization of data?

    • @bkrai
      @bkrai  5 років тому

      It's always better to do normalization.

    • @chitralalawat8106
      @chitralalawat8106 5 років тому +1

      @@bkrai I have many files which I want to concatenate..should I concatenate and then normalize the data or should I normalize and then concatenate?

    • @bkrai
      @bkrai  5 років тому

      You can first concatenate.

    • @chitralalawat8106
      @chitralalawat8106 5 років тому

      @@bkrai Are you sure?

  • @khushboobegwani1612
    @khushboobegwani1612 6 років тому +1

    Thank you so much sir for informative video. You really made it easy.

    • @bkrai
      @bkrai  6 років тому

      Thanks for your comments!

  • @azfersaeed1602
    @azfersaeed1602 8 років тому +1

    Great video man! Thank you very much for posting :). Could you show cluster analysis using more than 2 variables?

    • @bkrai
      @bkrai  8 років тому

      +Azfer Saeed thanks for feedback! In the example we have cluster analysts with 8 variables. However for scatter plot we use two variables at a time.

    • @azfersaeed1602
      @azfersaeed1602 8 років тому

      +Bharatendra Rai You are correct...sorry for the incorrect semantics. At 2:15, you mention that broadly, there are 3 clusters but they are based only on 2 variables. Is there a way to create clusters based on more than 2 variables?

  • @phediasdiamandis2441
    @phediasdiamandis2441 8 років тому +1

    Great Video. Congrats

    • @bkrai
      @bkrai  8 років тому

      +Phedias Diamandis thanks for the feedback 👍

  • @sayedyavar3752
    @sayedyavar3752 7 років тому +1

    i want to remove multiple columns from my data set just like you removed the company. what code should I use?

    • @bkrai
      @bkrai  7 років тому

      let's say tou want to remove columns 2, and 4 from 'data' that has 5 columns. Then,
      data1

  • @rinoypaultharu5071
    @rinoypaultharu5071 5 років тому

    Great tutorial, it really help for my analysis. Im having some douts, in that while silhouette calculation, whether we need to check average silhouette value, or which value we have to check to find out the number of clusters. Please help me with that. In your analysis what is the silhoutte value for k=3, where it is showing on that plot?
    Second while calculating my Euclidean distance, i have 40 observations, so it is not showing complete rows of Euclidean matrix, so is there any other way to obtain the complete matrix

  • @gambhiraogirish1710
    @gambhiraogirish1710 7 років тому +1

    Thanks for great explanation sir. May I have data set for practice please.
    Thanks again sir.

  • @kikaarias709
    @kikaarias709 6 років тому +1

    It was so comprehensible for me, could you please send me the .CSV ? I need to make by myself the exercise.

    • @kikaarias709
      @kikaarias709 6 років тому

      My mail is ing.erikarias@gmail.com

    • @bkrai
      @bkrai  6 років тому

      all set.

    • @kikaarias709
      @kikaarias709 6 років тому +1

      Thank you so much, this tutorial have been so helpful for my class.

  • @klaows
    @klaows 6 років тому +1

    Thank you for your video.
    I try to practice but my data are 48 rows, after I normalization, It omitted 26 rows
    What should I do ?
    Thank you

    • @bkrai
      @bkrai  6 років тому +1

      Normalization should not lead to omitting rows.

    • @klaows
      @klaows 6 років тому

      Bharatendra Rai Thank you for your replying. Yes I wonder. The program is omitted rows by itself.
      I was doing until I get the dendrogram. The trend is good. But I have no idea it correct is.

    • @liamhannah6325
      @liamhannah6325 6 років тому +1

      @@klaows sometimes R does that just to reduce the output visually, you can adjust with options(max.rows = 9999999)

  • @mfkalabdullah6966
    @mfkalabdullah6966 8 років тому +1

    Sir, Do you have more videos on clustering? Also, can I contact you in the future regarding clustering because I'm doing a research using data mining clustering?

    • @bkrai
      @bkrai  4 роки тому

      There is a playlist on clustering:
      ua-cam.com/video/otjWCaMcVaA/v-deo.html

  • @surbhiagrawal3951
    @surbhiagrawal3951 4 роки тому +1

    sir , can we not directly normalise the data using scales(utilities[,c(1,1)])

    • @bkrai
      @bkrai  4 роки тому +1

      Yes, there are several ways to normalize and get same results.

    • @surbhiagrawal3951
      @surbhiagrawal3951 4 роки тому +1

      Sir, You are very fast in replying , i am in Germany and here the time is 12 o clock in night and i am studying right now through your video and you have replied instantly , probably time in india is 3 o clock in night .. Thanks a lot for great content

    • @bkrai
      @bkrai  4 роки тому +1

      I'm based in US, so problem with time.

    • @bkrai
      @bkrai  4 роки тому

      I meant no problem with time

  • @sk93359
    @sk93359 6 років тому +1

    Dear Bharatendra Rai can you please make some video based on SOM clustreing in with Examples and please brief about SOM clustreing in Hindi as well as English,, Please

    • @bkrai
      @bkrai  6 років тому

      Thanks for the suggestion, I;ve added this to my list.

    • @sk93359
      @sk93359 6 років тому

      i am waiting for ur video based on Soms clustreing ...when it will uploads

  • @sathiarams7273
    @sathiarams7273 8 років тому +1

    Nice video and beautiful explanation... where can I download this data set utilities. pl help

    • @bkrai
      @bkrai  8 років тому

      send me your email id.

  • @Bidushranjan
    @Bidushranjan 4 роки тому +1

    sir can u make a video about D2.dist function of biotools packages to calculate d2 distance matrix easily
    and tochers method of clustering which is mostly used in agricultural research

    • @bkrai
      @bkrai  4 роки тому

      Thanks, added to my list.

  • @aks1008
    @aks1008 5 років тому +1

    Sir how to remove multicollinearlity in cluster analysis as it is an unsupervised algorithm..there is no dependent variable..

    • @bkrai
      @bkrai  5 років тому

      Multicollinearlity is a problem only for regression models. For cluster analysis it not an issue.

  • @jyoti9426
    @jyoti9426 5 років тому +1

    How to plot clusters if I already know the affiliations of the nodes?

    • @bkrai
      @bkrai  4 роки тому

      Not sure about your question, but you may try this:
      ua-cam.com/video/wLu213JKfnQ/v-deo.html

  • @javzmaatsend3785
    @javzmaatsend3785 4 роки тому +1

    Thank you, Very easy

    • @bkrai
      @bkrai  4 роки тому

      You are welcome!

  • @pavankumarpotta4565
    @pavankumarpotta4565 8 років тому +1

    please add a vedio for how to do ward's method

    • @bkrai
      @bkrai  3 роки тому

      seeing today, but thanks!