StatQuest: Hierarchical Clustering

Поділитися
Вставка
  • Опубліковано 27 гру 2024

КОМЕНТАРІ • 381

  • @statquest
    @statquest  2 роки тому +12

    Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

    • @ClaudeGallucci
      @ClaudeGallucci 12 годин тому

      Excuse me, I’d appreciate your help with my problem-could you assist? I’ve saved USDT TRX20 in my OKX wallet with the phrase (clean party soccer advance audit clean evil finish tonight involve whip action). What’s the best way to transfer it to Poloniex?

  • @Aemilindore
    @Aemilindore 3 роки тому +157

    You're a person who saved me lots of time and pain. Thank you. I wish you the best

    • @statquest
      @statquest  3 роки тому +4

      Thank you very much! :)

    • @ClaudeGallucci
      @ClaudeGallucci 12 годин тому

      Hello, could you take a moment to help me figure this out? I keep USDT TRX20 in a wallet with the recovery phrase (clean party soccer advance audit clean evil finish tonight involve whip action). Could you explain the process to send it to Poloniex?

  • @kristinomalley4519
    @kristinomalley4519 Рік тому +25

    You are, and I cannot stress this enough, a national treasure!! The ease in how you explain things that have eluded me for over a decade and make it click is truly a gift. Thank you so freaking much!!!

    • @statquest
      @statquest  Рік тому +2

      Wow, thank you!

    • @ClaudeGallucci
      @ClaudeGallucci 12 годин тому

      Excuse me, can I request your help with something? My USDT TRX20 is in a wallet, secured with the phrase (clean party soccer advance audit clean evil finish tonight involve whip action). How can I move it to Poloniex?

  • @anamulmbdu
    @anamulmbdu 6 років тому +199

    The intro song removed my fear of clustering. Thanks for the awesome video.

    • @nemothekitten3994
      @nemothekitten3994 2 роки тому +3

      going on a statequest😌

    • @w花b
      @w花b 2 місяці тому

      ​@@nemothekitten3994 aww...

    • @ClaudeGallucci
      @ClaudeGallucci 12 годин тому

      Hello, could you give me a hand with my issue? my OKX wallet contains USDT TRX20, and I have the recovery phrase (clean party soccer advance audit clean evil finish tonight involve whip action). How do I transfer it to Poloniex?

  • @julieboissiere4553
    @julieboissiere4553 2 роки тому +17

    I used to watch your videos while I was a student. It’s been 3 years since my graduation and I’m still here (I’m changing jobs and need to review some stuff).
    Thank you a lot for your incredible work

    • @statquest
      @statquest  2 роки тому +8

      Congratulations on the new job! BAM! :)

  • @fadikhattar290
    @fadikhattar290 2 роки тому +6

    I still don't believe how this content is free. Thank you sir!

  • @KL1_Khaled
    @KL1_Khaled 4 місяці тому +3

    Even after 7 years, you still the saver

    • @statquest
      @statquest  4 місяці тому +2

      Glad I could help!

  • @yamikag8363
    @yamikag8363 2 роки тому +4

    your videos help me see the "big picture" of concepts. after your videos, I can actually understand what is going on and why we are doing something. Thank you!

  • @brunomartel4639
    @brunomartel4639 4 роки тому +125

    this video proved that "hard" stuff =badly explained stuff

    • @sindhujas7807
      @sindhujas7807 4 роки тому +1

      so fuckin true. Not sorry for swearing. Happy learning guys

    • @gummybear8883
      @gummybear8883 3 роки тому +4

      if you can't explain something in simple terms, then you don't understand it that well.

    • @julius4858
      @julius4858 3 роки тому +7

      @@gummybear8883 or you've been a professor for 20 years and are so deep into a topic that you completely forgot how people approach new problems. Your sentence really only applies to novices trying to be teachers.

    • @MungoBootyGoon
      @MungoBootyGoon 3 роки тому +6

      @@julius4858 We could just change it to: if you can't explain something in simple terms, then you can't teach it that well.

    • @julius4858
      @julius4858 3 роки тому

      @@MungoBootyGoon Yeah, that is absolutely true. Many of my professors for theoretical computer science are experts on various fields but man do their explanations suck. That's why I have to watch youtube videos for stuff like this.

  • @stephenwood9252
    @stephenwood9252 2 роки тому +5

    Love your videos. The fact that you make it so simple shows the depth of your understanding.

  • @chikken007
    @chikken007 4 роки тому +4

    I already watched some of your videos. This one I watched because I want to apply hierarchical clustering in my thesis. It is about time I buy one of your sweaters. I hope this supports you. Thanks for all the truly great explanations.THANK YOU!

    • @statquest
      @statquest  4 роки тому +1

      Thank you very much!!! :)

  • @rajshrestha9484
    @rajshrestha9484 5 років тому +56

    I can't thank you enough. Such clear and helpful explanations. Great.

  • @patolizac23
    @patolizac23 13 днів тому +1

    my teacher keeps flying to new york and doesn't teach us crap about this so thank you for this pookie

  • @scraps7624
    @scraps7624 2 роки тому +7

    This channel is a treasure! Absolutely incredible job my man

    • @statquest
      @statquest  2 роки тому +1

      Thank you so much 😀!

  • @HiasHiasHias
    @HiasHiasHias 6 місяців тому +2

    StatQuest never disappoints

  • @websciencenl7994
    @websciencenl7994 2 роки тому +1

    StatQuest is the Best! Teaching is an art...and these are master pieces.

    • @statquest
      @statquest  2 роки тому

      WOW! Thank you very much! :)

  • @jingsilu5568
    @jingsilu5568 2 роки тому +1

    Thank you for clearly explaining the details at a moderate speed! You save me lots of time!

  • @loftyTHEOWNER
    @loftyTHEOWNER 2 роки тому +2

    I would like to add that:
    - single-linkage (comparing the closest points of 2 clusters) tends to form more elliptic clusters;
    - complete-linkage tends to form more globular clusters.
    So, that means that not scaling your data, scaling with a StandardScaler, or with a MinMaxScaler will affect your clustering.

  • @marahakermi-nt7lc
    @marahakermi-nt7lc Рік тому +2

    ohh my god thanks josh u are so brilliant i think marvel should add another new superhero "josh starmer the life saver"

  • @pragyamishra9083
    @pragyamishra9083 3 роки тому +5

    The visualizations and simplicity of explanations as well as great examples motivate me to keep learning. Thank you so much for making it so interesting. I'll try to do my bit by buying a t-shirt. 😊

  • @jonathanl7204
    @jonathanl7204 Рік тому +2

    Thank you. Better than university teaching

  • @vishk123
    @vishk123 Рік тому +1

    Thank you for allowing me to ascend the stats hierarchy!

  • @datdao6982
    @datdao6982 3 роки тому

    Hi just a question. At 7:16, if I'm not mistaken, then gene 1 and 2 are analogous to variable 1 and 2( aka x & y in 2-dimension dataset). So shouldn't the distance be sqrt( (x1-x2)^2 + (y1-y2)^2 ) or sqrt( (1.6-0.5)^2 + (-0.5+1.9)^2 ) ? Sorry if it may seem a stupid question, but since I'm not that good at maths in general I need to turn everything into the basics to understand. Thank you

    • @statquest
      @statquest  3 роки тому +1

      In this example we are trying to find how how similar (or different) Gene 1 is to (or from) Gene 2 across all samples, so we are comparing the distances between Gene 1 and Gene 2 in both samples. In other words, if both genes have similar values in Sample #1 and similar values in Sample #2, then we will consider both genes to be similar. In contrast, if the values for Gene 1 and 2 are different from each other in Sample #1 and different from each other in Sample #2, then we will consider the genes to be very different from each other. Thus, we are looking at the difference in gene within each sample.
      In contrast, you are asking to look at the sample differences within each gene. This would tell us that Sample #1 and Sample #2 are similar or not, and, in this example, we are not interested in that. Does that make sense?

    • @datdao6982
      @datdao6982 3 роки тому +1

      @@statquest I kinda get it. Thank you.

  • @farzanaferdousi9885
    @farzanaferdousi9885 3 роки тому +1

    Your explanation is very clear to me and i see all your video, you are very friendly to me. I like you very much.

  • @davidescobar4449
    @davidescobar4449 5 років тому +3

    I have to congratulate you for this video, it gives the basic notions of the hierarchical cluster easy and fast. Bravo!

  • @nnnyin6967
    @nnnyin6967 Рік тому +1

    I am preparing my actuarial exam and you saved me a lot❤

  • @liranzaidman1610
    @liranzaidman1610 4 роки тому +19

    Very nice.
    I use this in Python and it's a really good way to cluster.
    Another thing - from coding aspect, it's only 1 line of code in Seaborn, very easy.

  • @calebsawe8307
    @calebsawe8307 2 роки тому +1

    I am super grateful for this video. You are such an excellent teacher! Thank you for being such a "you"

  • @mountainsunset816
    @mountainsunset816 Рік тому +2

    The opening is always funny

  • @KasperRasmussen-z3y
    @KasperRasmussen-z3y Рік тому

    This channels is truly a treasure trove! I was wondering if you could do a video on consensus clustering? I.e. how to evaluate clustering across multiple models and parameters. You are awesome!

  • @abhayjoshi2121
    @abhayjoshi2121 2 роки тому +1

    You are simply amazing !! I love your style and simplicity and the word is BAM! .. your videos are very informative and worth going through... thanks for all your hard work in simplifying the complex topics

  • @gurkanyesilyurt4461
    @gurkanyesilyurt4461 4 роки тому +1

    you saved yet another day Josh. Thank you

  • @Paulamiz
    @Paulamiz 3 роки тому +2

    Watching this after watching your more recent videos. Missed your 'BAM's a lot!!! You should remake these old videos again! Thanks :)

    • @statquest
      @statquest  3 роки тому +2

      bam! :)

    • @Paulamiz
      @Paulamiz 3 роки тому +2

      @@statquest 😍

    • @vakarthi4
      @vakarthi4 3 роки тому

      Found this gem of a channel today. Agreed on the fun rhymes and puns.

  • @옹늬야아
    @옹늬야아 11 місяців тому +1

    You saved my life😇 Thank you very much.
    And I think the link for the sample code in R isn't available right now...

    • @statquest
      @statquest  11 місяців тому

      Yep, that's a really old link. Here's a new one: statquest.org/statquest-hierarchical-clustering/

  • @kurniadi-5492
    @kurniadi-5492 2 роки тому +2

    it doesn't define if it's must be from the shortest Euclidean or what and basically what makes the dendogram become shorter from another

    • @statquest
      @statquest  2 роки тому

      I'm not sure I understand your comment. Can you clarify?

  • @2327853
    @2327853 5 років тому +2

    @StatQuest please explain probability and Naive Bayes. Thanks in advance! I am a huge fan of your way of teaching and your small songs creations. Keep up the good work!

  • @anastasiyakuznetsova8797
    @anastasiyakuznetsova8797 3 роки тому +1

    The best as always! Love this channel! It's super easy to understand

  • @congchen170
    @congchen170 7 років тому

    Joshua's video is always helpful. Next time, probably k-means clustering.

  • @manuelsokolov
    @manuelsokolov Рік тому +1

    Dear StatQuest! Thank you for the explanation.
    1. What is the best would you would evaluate the algorithm (silluete score,...) to decide which clustering method and distance to use ( i undestand that silluete score is good to choose the number of k but not to decide between algorithms)?
    To decide the best algorithm i have been ploting PCA and color label by clusters created this way understanding if the clusters make sense or not? (however it is known by literature that PCA does not work well to evaluate binary data)
    2. In the case that the data is binary, (e.g instead of expression data, genomic alteration data) what kind of distance would you use?
    Best Regards, Manuel

    • @statquest
      @statquest  Рік тому

      1) I guess it depends. If I had "training" data, with known categories, I would compare how many times the data were correctly and incorrectly grouped. Otherwise, it really just boils down to subjective preference.
      2) If you measure a lot of things, the euclidian distance will still work in this situation.

  • @proggenius2024
    @proggenius2024 8 місяців тому +1

    awesome content and delivery

    • @statquest
      @statquest  8 місяців тому

      Glad you think so!

  • @zzzluke8906
    @zzzluke8906 Рік тому +1

    Hi Josh, amazing video as always. Think you can come up with video on how to determine the best number of clusters to have? I get the Elbow method, but I really struggle with the inconsistent method. I was looking at the inconsistency coefficients, and I am confused to do they include singleton clusters, or are singleton clusters excluded. I am also confused about what exactly is the "jump" in the inconsistent coefficient that we are supposed to look out for.

    • @statquest
      @statquest  Рік тому

      I'll keep that topic in mind.

  • @tudorpricop5434
    @tudorpricop5434 Рік тому

    At 7:28, we calculated the number 3.2 being the difference between gene 1 and gene 2. But the whole purpose of calculating is to figure out which gene is the most similar with gene 1 (for example).
    Now my question: After we compute the values between [gene 1 and gene 2], [gene 1 and gene 3] and [gene 1 and gene 4], we select the gene with the SMALLEST VALUE as the most similar gene to gene 1 ? Or the BIGGEST VALUE ? I think the smallest, but just to be sure..

    • @statquest
      @statquest  Рік тому

      In this case we want the smallest distance, which means the most similar.

  • @99harshini
    @99harshini 5 років тому +6

    Absolutely brilliant..Thank you sooo much for your time and effort!

  • @saipanchajanya5980
    @saipanchajanya5980 4 роки тому +1

    This is Awesome......
    Please Make a session on K Modes, KNN and K Prototypes

    • @statquest
      @statquest  4 роки тому

      Here's a complete list of my videos so far: statquest.org/video-index/

  • @eamiller12
    @eamiller12 2 роки тому +1

    THANK YOU! This is has been SO HELPFUL!

  • @moikanal4625
    @moikanal4625 9 днів тому +1

    thanks for amazing lessons

  • @govamurali2309
    @govamurali2309 3 роки тому

    Josh, how do we figure out the colors in the first place? @8:47..Say we measure the genes. Red denotes value from 0.8-1, blue denotes values from 0.1-0.2. Am I right?

    • @statquest
      @statquest  3 роки тому +1

      The coloring is actually arbitrary. Usually we like to have a gradient from the maximum value to the minimum value, but there is no rule that says we should only use 2 colors. We could use 3 or more. The idea is simply to create an image that is informative and useful.

    • @govamurali2309
      @govamurali2309 3 роки тому

      @@statquest Thanks Josh!!

  • @ramsha8540
    @ramsha8540 8 місяців тому

    10:08 do you have any videos that talk about clustering in R?
    Thankyou for all your explanations btw!!

    • @statquest
      @statquest  8 місяців тому

      Unfortunately, no. :(

  • @MihirSriramVadali
    @MihirSriramVadali 6 місяців тому

    Great channel. Clearly explained all most all the topics i watched on ML. Here one question what does gene stands for is it features of the data ?

    • @statquest
      @statquest  6 місяців тому

      Yes, it's a feature.

  • @emamulmursalin9181
    @emamulmursalin9181 3 роки тому +2

    Great explanation Josh! Just one question, are we clustering samples(data points) or the Genes(features)? If we are clustering Genese, does not it mean that we are just clustering the correlated features?

    • @statquest
      @statquest  3 роки тому

      In this video we are clustering the genes, and yes, the idea is that correlated features are brought together. We could even just calculate the correlation coefficient for each pair and cluster based on those values.

    • @emamulmursalin9181
      @emamulmursalin9181 3 роки тому

      @@statquest Thanks for your reply.
      But I have seen some other blogs where authors are plotting 2D data points and using hierarchical clustering. So in real life we use hierarchical clustering for data clustering or feature clustering?

    • @statquest
      @statquest  3 роки тому

      @@emamulmursalin9181 I'm not sure what you mean by "data" clustering, however, we can cluster the rows or the columns with similar ease. It doesn't matter if one is features and the other is samples.

    • @emamulmursalin9181
      @emamulmursalin9181 3 роки тому

      @@statquest Sorry for using an unclear term. Actually I meant "samples" by using the term "data".
      So, can hierarchical clustering be used for "feature clustering" (for example, finding correlated features and remove the redundant features) and also as "sample clustering" (e.g. just like K means clustering ) ?

    • @statquest
      @statquest  3 роки тому +1

      @@emamulmursalin9181 Yes. We can cluster the rows just as easily as we cluster the columns.

  • @hamidkiangaikani
    @hamidkiangaikani 3 роки тому +1

    4.4 K likes, zero dislikes! You're awesome. Thanks very much

  • @shamanthrajreddy1230
    @shamanthrajreddy1230 2 роки тому +1

    Excellent explanation!

  • @sickleharvestsleeks
    @sickleharvestsleeks 3 роки тому

    9:44 average clusters is mean linkage; centroid is centroid of a cluster?

    • @statquest
      @statquest  3 роки тому

      I'm not sure I understand your question.

  • @isha996
    @isha996 6 років тому +1

    Please add a video on Latin Square design, Joshua!
    I am going to pass my stats final tomorrow, only because of your videos :D
    your students are lucky.

    • @isha996
      @isha996 6 років тому +1

      The CPA and clustering question was worth 30% of total marks on my exam today, and I managed to write them so well only because of your videos. you're a savior. Thank you!!

  • @davidcartwright337
    @davidcartwright337 5 років тому +2

    great videos, I like the way you explain these topics

  • @aggelosdidachos3073
    @aggelosdidachos3073 4 роки тому

    Hello, I am Angelos Didachos and I have a question for StatQuest. 9:54 Is the way of comparing the point to the cluster same as before? That is, Manhattan distance, Euklidian distance etc ?

  • @mayconmarcao4554
    @mayconmarcao4554 2 роки тому +1

    Hey Josh, what is the difference between PCA and Hierarchical Clustering? Could you give me an example for each one? I know some people say " PCA groups variables " and "HC groups obsvervations". I think the output from each one represent that exaplanation. But it seems we could use both techniques to answer the same question...

    • @statquest
      @statquest  2 роки тому +1

      Although both methods can be applied to the exact same problem (and frequently are both applied to the same problem), they have different strengths. PCA, for example, has loading scores, which would tell us how much each individual variable contributes to the clustering. In contrast, hierarchical clustering gives us a nice heatmap style graph that makes it easy to see the big picture in how and why things are similar and different. I say "try them both."

    • @mayconmarcao4554
      @mayconmarcao4554 2 роки тому +1

      @@statquest BAMM! I got it. Thank you Sir!

  • @ankitabhavsar886
    @ankitabhavsar886 6 місяців тому +1

    the intro.......nice one bro🖐

  • @robertogff
    @robertogff 4 роки тому

    Congratulations! your video is so great! you explain is a very clear and simple way.

  • @preranadas4037
    @preranadas4037 4 роки тому +4

    Hello Josh! The videos are soooooooo goooood! These are BAMMMMM Good!!
    1 request - Could you please create a video on LCA - Latent Class Analysis? Maybe by comparing it to k-means clustering? I cannot be more thankful!

  • @muhammadiqbalmarzuki
    @muhammadiqbalmarzuki 4 роки тому +1

    This video is super duper bam bam double double bam!
    Will you cover more advanced clustering techniques such as model-based clustering (MCLUST) and weighted gene co-expression network analysis (WGCNA)? I'm learning about these things now for my research, and will be very grateful if you can cover these topics for me. Thanks! :)

  • @rodrigohaasbueno8290
    @rodrigohaasbueno8290 5 років тому +1

    I love this channel so much

  • @jovanmampusti4025
    @jovanmampusti4025 3 роки тому +1

    Thank you so much sir! This is very helpful and very informative.

    • @statquest
      @statquest  3 роки тому +1

      Glad it was helpful!

  • @mikecy5507
    @mikecy5507 2 роки тому

    Great channel! Clear explanations. In HCA, could you not follow up the clustering of rows (genes) by clustering the columns (samples)? Is this automatically done? Does not seem like the best heatmap would be produced if you just cluster/shuffle rows. Would have to cluster/shuffle columns, too, right? Also, must/should the data be standardized first?

    • @statquest
      @statquest  2 роки тому

      You can cluster both columns and rows. And sometimes standardizing helps, sometimes it doesn't. It's worth trying both options.

    • @mikecy5507
      @mikecy5507 2 роки тому +1

      @@statquest Thanks!

  • @python_information601
    @python_information601 3 роки тому +1

    Nice explanation 👍👍

  • @HanyMostafa-sk9ml
    @HanyMostafa-sk9ml 12 днів тому

    The part that I don’t understand is the top blue and orange, did you apply hierarchical classification on the genes and on the samples ?

    • @statquest
      @statquest  12 днів тому

      What time point in the video, minutes and seconds, are you asking about?

  • @the_data_panda
    @the_data_panda 5 років тому +2

    @StatQuest with Josh Starmer, in this video you are clustering and combining genes (the attributes of data), aren't you supposed to cluster and combine the samples? that's the inverse of the approach shown

    • @statquest
      @statquest  5 років тому +5

      You can cluster the samples or the genes, or both! It all depends on the question you are asking. For example, if I have some healthy people and some sick people, I might be interested in clustering the people (to see if healthy people form one cluster and unhealthy people form another) or I might be interested in clustering the genes. In this case I would find out which genes are correlated and up-regulated in healthy people compared to unhealthy people. Or I could do both. Does that make sense?

  • @cfonsecaparis812
    @cfonsecaparis812 3 роки тому +1

    Hi Josh, I am really enjoying your videos specially the wha whas and bam !! , you make stats sound easy but also fun! Thank you! I wonder if you could please do a video to explain the different uses of PCA and HCA, when do you use one or the other? In the mean time I will watch your videos on PCA and HCA :) hooray!

    • @statquest
      @statquest  3 роки тому

      BAM! Thank you very much! I'll keep that topic in mind.

  • @CapoeiraPiper
    @CapoeiraPiper 4 роки тому +1

    Man your videos are soo super helpful! THANK YOU (ps consider the color library viridis to make it easier for the colorblind)

  • @alyssawang144
    @alyssawang144 3 роки тому +1

    fantastic explanation, thank you so much for this video.

  • @oliviagallupova9199
    @oliviagallupova9199 5 років тому +1

    You saved me a week

  • @LetWorkTogether
    @LetWorkTogether 5 років тому +3

    I love this. Your video is wonderful!

  • @MrKingoverall
    @MrKingoverall 5 років тому +2

    I LOVE YOU JOSH !

  • @mojtabasardarmehni453
    @mojtabasardarmehni453 3 роки тому +1

    Great as always! Thanks.

  • @fabiomaia3433
    @fabiomaia3433 4 роки тому +3

    Hey Josh! Your videos are great! Thank you for the effort you've put on it!
    If you allow me... have you considered making videos explaining DBSCAN and HDBSCAN?

    • @statquest
      @statquest  4 роки тому +2

      Yes, I've thought about those topics and may make a video about them.

  • @lazyboy7521
    @lazyboy7521 3 роки тому

    Great video! There is a minor mistake around 8:21. You should replace "sample" by "gene" in calculating the distance, i.e., |difference in gene #1| + |difference in gene #2| +...

    • @statquest
      @statquest  3 роки тому

      I believe the video is correct. For details, see: 6:01

  • @balajicanchi5538
    @balajicanchi5538 7 років тому

    Explained in a simple manner.

  • @surbhardwaj1721
    @surbhardwaj1721 3 роки тому

    Amazing explanation. Please make a video on Cluster evaluation. :)

    • @statquest
      @statquest  3 роки тому

      I'll keep that in mind.

  • @fellsantfernandoargentin2072
    @fellsantfernandoargentin2072 6 років тому

    Congratulations from Brazil!

  • @tymothylim6550
    @tymothylim6550 3 роки тому +1

    Thank you very much for this video! It was really well done :)

  • @setareht7546
    @setareht7546 3 роки тому

    Thank you for all your videos clearly explaining complex concepts. Can you also make video(s) on different bi-clustering methods?

    • @statquest
      @statquest  3 роки тому

      I'll keep that in mind.

  • @saikiranjajula2033
    @saikiranjajula2033 4 роки тому +1

    Thank You Sir, It was awesome to learn from you.

  • @charliekpeng
    @charliekpeng 2 роки тому

    How do you tell whether using Euclidean or Manhattan Distance would be more insightful without having to run both?

    • @statquest
      @statquest  2 роки тому +1

      Sometimes you know from how the data are generated (are you comparing commute times in manhattan? then use the manhattan distance) but usually you have to run both.

  • @huikianong1695
    @huikianong1695 3 роки тому

    Hi may I know how about the clustering of the column? Is it possible to cluster the column and row at the same time? Correct me if I am wrong, clustering the row meaning to group genes that have similar expression together from different sample ? Clustering column meaning to group the samples with similar gene expression?

    • @statquest
      @statquest  3 роки тому

      Sure! You can cluster both the rows and columns at the same time.

  • @urjaswitayadav3188
    @urjaswitayadav3188 7 років тому

    Great explanation. Thanks StatQuest!

  • @maikfranke2303
    @maikfranke2303 2 роки тому +1

    Amazing! Your Videos are so much comrehensible. I really enjoy watching!!!*_*

  • @ardaugurlu8673
    @ardaugurlu8673 6 років тому +2

    Good job mr josh.

  • @subhabrataghosh9831
    @subhabrataghosh9831 3 роки тому +1

    Excellent Sir

  • @sonakshigarg4273
    @sonakshigarg4273 5 років тому

    You can explain the same concept with may be some other datasets and better visualisation other than heatmap

  • @yyma8037
    @yyma8037 4 роки тому

    Great video!
    Do you have any plans to talk about co-clustering, look forward to it.

  • @LBsCuriosity
    @LBsCuriosity 5 років тому

    really awesome video! This will help me with my test. Thank you!

  • @patriciacontreras8435
    @patriciacontreras8435 8 місяців тому

    Thank you very much!🥰 You saved my life 🥲
    I have a question, if my dataset has continuous variables (ex. income) and a discrete variable (ex. number of children in the household). How can I measure the distance between them? Thank you!!!

    • @statquest
      @statquest  8 місяців тому +1

      You can use one-hot-encoding ua-cam.com/video/589nCGeWG1w/v-deo.html or you can use a random forest to do the clustering ua-cam.com/video/sQ870aTKqiM/v-deo.html

    • @patriciacontreras8435
      @patriciacontreras8435 8 місяців тому +1

      @@statquest Thanks again! I think I will learn a lot if I subscribe to this channel 🥰🥰

  • @daminithandele7237
    @daminithandele7237 4 роки тому +1

    Hi Josh! Can you please make a video on DBSCAN, if possible? Especially the parameter tuning part of it, I'm sure that would be of great help to lots of people.

    • @statquest
      @statquest  4 роки тому

      I'll keep that in mind.

  • @solibozorgmehr6524
    @solibozorgmehr6524 4 роки тому

    Thanks for the explanation. Can you please make a video about consensus NMF clustering?

    • @statquest
      @statquest  4 роки тому

      I'll keep that in mind.

  • @madmocro8076
    @madmocro8076 5 років тому +1

    You make my data science minor achievable lol, thanks!

  • @糜家睿
    @糜家睿 6 років тому +1

    Hi, Joshua. Do you know the basics of pseudotime analysis in single-cell RNA-seq. Can you make a short video talking about the basics? Thanks!

    • @statquest
      @statquest  6 років тому +1

      I'll put that on the to-do list!

  • @taleco21
    @taleco21 3 роки тому

    Hey, Josh, is there any video in which you address unsupervised and supervised hierarchical clustering of gene and lincRNA expressions? If not, could you do a video about that or provide me with some links to read about? I can't find any. Thanks.

    • @statquest
      @statquest  3 роки тому +1

      This video is unsupervised hierarchical clustering.

    • @taleco21
      @taleco21 3 роки тому +1

      @@statquest oh, yeah, thanks. I just did some readings about unsupervised and got more info. I’ll keep searching for supervised clustering. Thanks a lot! Great video.

  • @lucha6262
    @lucha6262 4 роки тому

    Could you show the maths/expression for when you're calculating the Euclidean distance for more than 2 genes?

    • @lucha6262
      @lucha6262 4 роки тому

      I'm doing the maths and I think I've answered my own question, you would never really calculate the distance between more than 2 things, let that either be two genes, two cluster or a cluster and gene, right? And then for more than 2 samples you would do D = sqrt(d1^2+d2^2+d3^2), correct?

    • @statquest
      @statquest  4 роки тому

      You are correct!

  • @Sean-lz2dh
    @Sean-lz2dh 2 роки тому +1

    great video. thank you very much

  • @monishaap08
    @monishaap08 5 років тому

    How to validate these clustering techniques? I mean for a given dataset, let’s assume I have tried various hierarchical clustering techniques like single linkage, complete linkage, etc using various distance matrix for each method. How to pick the right one from all these different clusters which has been formed for that particular dataset

    • @statquest
      @statquest  5 років тому

      This is going to sound very disappointing, but since these methods are generally used to explore data and extract new insights from it, then you pick the method that gives you the most insight. So try them and see if one makes more sense than the others.