Silhouette (clustering)- Validating Clustering Models- Unsupervised Machine Learning

Поділитися
Вставка
  • Опубліковано 31 гру 2024

КОМЕНТАРІ • 67

  • @arjundev4908
    @arjundev4908 4 роки тому +35

    So to brief about clustering algorithm..
    1. its a unsupervised Machine learning algorithm.
    2. To find out number of clusters we use "Elbow method".
    3. Silhouette(si-lo-wet) scores helps in finding whether the data points in a Clusters formed belongs to respective cluster or not.
    formula is b-a/max(b,a) where 'a' is distance within the cluster and 'b' is distance between the cluster... if ab and the silhouette scores are negative and highly unlikely that clustering formed is incorrect..
    Please add points if missed or correct me if i am wrong..!!

    • @shubhamthapa7586
      @shubhamthapa7586 4 роки тому

      what do you think of DBSCAN and HDBSCAN silhoutte coefficient wont work there cuz we dont have no of clusters as a parameter ?

    • @sampadkar19
      @sampadkar19 Місяць тому

      @@shubhamthapa7586 do we actually need k ??
      beacuse hiearchical and DBSCAN both form clusters so we can easily do it using the formula

  • @NashatJumaah
    @NashatJumaah 6 місяців тому

    Thanks for the easy to follow tutorials...Big Love from Iraq

  • @chintansoni6370
    @chintansoni6370 3 роки тому

    Thanks Krish. Much better understanding. Really appericiate your efforts to provide knowledge by creating videos.

  • @ramendrachaudhary9784
    @ramendrachaudhary9784 4 роки тому +1

    pretty much simple and pretty much amazing explaination. Thanks Krish

  • @kushagrak4903
    @kushagrak4903 3 роки тому +9

    Sir i have doubt like , in your example the silhouette value of 2 greater than 4 , so how can we decide whether k = 4 is good than k=2 without implementing any diagram .

  • @matheusgoes640
    @matheusgoes640 2 роки тому

    Thank you so much for this video! Great explained!👏

  • @mprasad3661
    @mprasad3661 4 роки тому +2

    Awesome explanation bro

  • @sihammohamed7480
    @sihammohamed7480 4 роки тому +2

    Thanks Krish Awesome explanation :)

  • @DeepGamingAI
    @DeepGamingAI 4 роки тому +2

    Loved it, very nicely explained :)

  • @naufalsiregar9662
    @naufalsiregar9662 2 роки тому

    I just saw your video, this is great... Thanksss

  • @kamilc9286
    @kamilc9286 4 роки тому +1

    Sir, if you'll run the number of negative_samples k=4 has 1 (not visible on the chart), so k=2 has score 0,74 and 0 -ve and k=4 has score 0,65 and 1 -ve. Only elbow method suggests k=4.

  • @DineshBabu-gn8cm
    @DineshBabu-gn8cm 3 роки тому

    Excellent explanation Krish

  • @thankyouthankyou1172
    @thankyouthankyou1172 4 роки тому +3

    8:18: why there's no mention about the min()?

    • @krishnag5734
      @krishnag5734 3 роки тому +1

      The min is used to find the nearest neighboring cluster to the current cluster Ci. To get it, compute b(i) for all the clusters from a point in the current cluster and get the min value. Whichever cluster holds that min value, is the nearest neighbor to the current cluster. Let's say you have 5 different clusters, and you are computing b(i) to the rest of the 4 clusters from the 1st. you will have 4 different b(i) values. Whichever value is minimum, the corresponding cluster is closest to the first cluster.

  • @stevemungai3542
    @stevemungai3542 2 роки тому

    excellent explanation

  • @amitgupta-tb9td
    @amitgupta-tb9td 4 роки тому +2

    Sir plss make more validation techniques for unsupervised learning...!

  • @DevanshKhandekar
    @DevanshKhandekar 4 роки тому +5

    Can be better explained with terms like ' Intracluster distance ' and 'Intercluster distance'

  • @prasadshiva3538
    @prasadshiva3538 Рік тому

    thats great lecture Krish, but what if ,we got k=5 in Silhouette but k=4 in Elbow, how to conclude this the correct k value

  • @पंकजकुलड़िया

    at 18:02 there is a negative value for Cluster no.-2, so I think k should be 2, Please clarify sir, Also thanks a lot for making such kind of video and Virtual Interview sessions...

    • @mohammedameen3249
      @mohammedameen3249 3 роки тому

      but as elbow method used which sow k =4 is the best

  • @ashulohar8948
    @ashulohar8948 2 роки тому

    Plz make vedios on performance metrics of regression algorithm

  • @sajidchoudhary1165
    @sajidchoudhary1165 4 роки тому +4

    Sir Please make video on xgboost math intuition for Regression and Classification Please Sir Please

    • @sajidchoudhary1165
      @sajidchoudhary1165 4 роки тому +1

      @Karthik Vishwanath yes i was watching but i couldn't understand some hyperparameters like gamma, lambda, cover

  • @iqbalsaviola6052
    @iqbalsaviola6052 3 роки тому

    Something that im looking for

  • @1potdish271
    @1potdish271 3 роки тому

    Very nice explanation. (y)

  • @rishisingh5581
    @rishisingh5581 4 роки тому +2

    Sir community class aaj launch hogi ki nahi?

  • @divyamadhuri126
    @divyamadhuri126 2 роки тому

    That was super helpful. My doubt is that if my clusters are overlapping, what should I interpret? Are my data points poorly clustered?

  • @vignesh7687
    @vignesh7687 3 роки тому

    Thank you Krish, But I could see a small negative value for K = 4 in the plot.

  • @MrChudhi
    @MrChudhi 3 роки тому

    Please can you explain how we can use adjusted RAND score for a K-Means models.

  • @sandipansarkar9211
    @sandipansarkar9211 2 роки тому

    finished watching

  • @sukshithshetty4847
    @sukshithshetty4847 2 роки тому

    has this code been shown in some other video from scratch ?

  • @sagemaker
    @sagemaker 11 місяців тому

    Pronunciation: Sil-who-at

  • @rhiothelab5251
    @rhiothelab5251 4 роки тому

    Thanks, Sir for all the content,
    Sir suppose I am using DBSCAN and it gives only one cluster then how to measure the correctness of cluster

  • @anbarasanpm3295
    @anbarasanpm3295 10 місяців тому

    How the point is chosen in Cluster 1?

  • @sandipansarkar9211
    @sandipansarkar9211 4 роки тому

    Superb explanation. Need to get my hands dirty with Jupyter notebook.Thanks

  • @rohitkamra1628
    @rohitkamra1628 4 роки тому

    What can we expect on discord server?

  • @adotac
    @adotac 2 роки тому

    Can someone tell me what software he used to draw on screen?

  • @sakshamshivhare2474
    @sakshamshivhare2474 2 роки тому

    Hi krish , i wanted to ask in an interview i was asked how to interpret the results of K means clustering and how to label the results. can you or anyone help me out with this question

  • @rezamohammadi3096
    @rezamohammadi3096 4 роки тому +1

    Hi @Krish
    Thanks for amazing tutorial. I'm using k-prototyps library (for mixed numerical and numinal data type) and I want to calculate Silhouette Index to compare my clustering results with previous studies (e.g. k-medoid). Could you please give me a clue to calculate Silhouette Index in my case?

    • @saisidhartha2855
      @saisidhartha2855 3 роки тому

      This is the same case with, me kindly let me know whether it is possible. I have used elbow method for k-prototype to determine the K value. Looking forward for shilloute method also

    • @zama-sarib
      @zama-sarib 2 роки тому

      @@saisidhartha2855 We can use gower distance with gower_distance as precomputer metric in silhoutte sklearn. Gower distance typically works well with mixed data like numerical and categorical data types.

  • @soniakashyap001
    @soniakashyap001 4 роки тому

    Hi Krish, when i used to search to find any method by which we could check validity of cluster i used to get only elbow methods in search. I never came across links related to silhouettes. How did you find this method?

    • @mranaljadhav8259
      @mranaljadhav8259 4 роки тому

      Try to check all the link provided by google, i think you opened only one link.
      I search like you "validity of cluster" , I got the right methods like silhouette, dunn index etc

  • @yashsethi2402
    @yashsethi2402 3 роки тому

    What is the name of the plot that you created?

  • @varnikareshma1873
    @varnikareshma1873 2 роки тому

    sir, instead of using Euclidean or Manhattan Distance can we use cosine based distance. If it is possible can u please hint me how to use it.

  • @ayushgoel9584
    @ayushgoel9584 4 роки тому +1

    community classes ka kya hua ??

  • @sambit123sahu
    @sambit123sahu 3 роки тому

    what is the value of |C| ?

  • @hicodeguru
    @hicodeguru 3 роки тому

    X has 500 features, but KMeans is expecting 1000 features as input. ??

  • @chandinisaikumar2736
    @chandinisaikumar2736 4 роки тому

    Can we use this evaluation method for DBSCAN

    • @shubhamthapa7586
      @shubhamthapa7586 4 роки тому

      no i dont think so

    • @chandinisaikumar2736
      @chandinisaikumar2736 4 роки тому

      @@shubhamthapa7586
      Can you please let me know the evaluation methid for DBSCAN
      Thanks in advance

    • @shubhamthapa7586
      @shubhamthapa7586 4 роки тому

      @@chandinisaikumar2736 in current scenario there is no best evaluation metric available for DBSCAN however you can use silhoutte coeffecient for a refrence but you need to optimize the parameters of DBSCAN first which is hard as compare to KMeans clustering , honeynet.github.io/cuckooml/2016/07/19/clustering-evaluation/ here they make a good use of silhoutte coeff in DBSCAN

  • @aimem246
    @aimem246 3 роки тому

    thx

  • @oscarfamousdarteh189
    @oscarfamousdarteh189 4 роки тому

    Sir , please could you show another video with an uploaded csv data ? I'm finding some difficulties

  • @raviyadav2552
    @raviyadav2552 4 роки тому

    si-low -et

  • @manikprabhug412
    @manikprabhug412 4 роки тому

    Hi sir

  • @TJ-wo1xt
    @TJ-wo1xt 3 роки тому

    dont pay wikipedia, not everything that is there is correct.

    • @diosmorbodiosmorbo9547
      @diosmorbodiosmorbo9547 3 роки тому

      It is indeed. That thing of "in wikipedia anyone can writte anything" is false. You have a very estrict and dedicated community validating every time someone writtes smh. And with new topics like ML is even more. So stfu