Machine Learning Lecture 29 "Decision Trees / Regression Trees" -Cornell CS4780 SP17

Поділитися
Вставка
  • Опубліковано 13 січ 2025

КОМЕНТАРІ • 46

  • @prattzencodes7221
    @prattzencodes7221 4 роки тому +59

    With all due respect to Professor Andrew Ng for the absolute legend he is, Killian,you sir, are every ML enthusiasts' dream come true. 🔥🔥🔥🔥🔥

  • @bansaloni
    @bansaloni 4 місяці тому +1

    I have been watching your lectures for years now. I must say, the style of teaching is the best ! Every-time I need a refresher on some topic, your ML series is the first I think of. Thank you for the amazing content! 😃

  • @AnoNymous-wn3fz
    @AnoNymous-wn3fz 3 роки тому +11

    15:13 introducing Gini impurity
    23:50 KL algor
    46:00 Bias-Variance discussion

  • @orkuntahiraran
    @orkuntahiraran 3 роки тому +4

    This is perfect. I am coming from a non-technical, non-math background; and this presentation really made me understand DT easily. Thank you very much.

  • @abhishekkdas7331
    @abhishekkdas7331 4 роки тому +2

    Thanks Professor Kilian Weinberger. I was looking for a refresher on the topic after almost 5 years and you have made it as easy as possible :) !

  • @khonghengo
    @khonghengo 4 роки тому +1

    Thank you very much, Prof. Weinberger. I was reading The Elements of statistical Learning as my reading course, then I found your channel. I truly appreciate your lectures also your notes, I print all of your notes and watch your almost all of your videos, they are extremely helpful. Thank you, I really appreciate that you let us have access to your wonderful lectures.

  • @jalilsharafi
    @jalilsharafi 3 роки тому +1

    I'm watching this end of December 2021, I found the demos at the end starting roughly at 45 mins in the video very informative about the capabilities and limitations of a decision tree. Thanks.

  • @silent_traveller7
    @silent_traveller7 3 роки тому +2

    Hats off to you sir. This series coupled with lecture notes is pure gold. I have watched several lecture series on youtube till the end but wow this lecture series has the most retentive audience.

  • @varunjindal1520
    @varunjindal1520 4 роки тому +1

    Thanks Professor Kilian Weinberger. Examples in the end was really helpful to actually visualize how trees can look like.

  • @TrentTube
    @TrentTube 5 років тому +24

    Kilian, is there some way I can contribute to you for your efforts in creating this series? It's been fantastically entertaining and helped in my understanding of these topics profoundly.

  • @cacagentil
    @cacagentil 3 роки тому

    Thank you for sharing your content.
    It is very interesting. Especially the discussion about why we do this ( computational problems, NP-hard, people tried many splits and found out it was the best in practice), the interactive examples at the end (very useful for learning) and all your work on trying to make it clear and simple. I like the point of view of minimizing the entropy from maximum the KL between two probability distributions.
    In fact, it is also easy to see the Gini impurity loss function as an optimization problem in 3D also (you get a concave/convex function by computing the hessian matrix with two parameters as the third one is just 1 - p_1 - p_2) and you have to optimize it on a space (conditions on the p_i) and you can actually draw the function and the space. You get the maximum/minimum at 1/3 for p_1, p_2, p_3 (what we don't want) and it is diminishing as we move away this point (with the the best case for one which is 1 and the others 0).

  • @geethasaikrishna8286
    @geethasaikrishna8286 4 роки тому +2

    Thanks for awesome lecture & your university for making it available online

  • @nicolasrover8803
    @nicolasrover8803 4 роки тому +3

    Thank you very much. Your teaching is incredible

  • @michaelmellinger2324
    @michaelmellinger2324 2 роки тому

    @34:28 Can view all of machine learning as compression

  • @KulvinderSingh-pm7cr
    @KulvinderSingh-pm7cr 6 років тому +1

    "No man left behind", wait .. that's Decision trees right ??
    Thanks prof. Enjoyed and learnt a lot!!

  • @rahulseetharaman4525
    @rahulseetharaman4525 2 роки тому

    Why do we do a weighted sum of the entropies ? What is the intuition behind weighting them and not simply adding the entropies of the splits ?

    • @kilianweinberger698
      @kilianweinberger698  Рік тому +2

      Good question. If you add them, you implicitly give them both equal weight. Imagine you make a split where on one side you only have a single example (e.g. labeled positive), and on the other side you have all n-1 remaining data points. This is a pretty terrible split, because you learn very little from it. However, on the one side with a single example you have zero impurity (all samples, namely only that single one, trivially share the same label). If you give that side as much weight as the other side, you will conclude that this is a great split. In fact, this is what will happen if you simply add them up, the decision tree will one by one split off single data points and create highly pathological "trees". So instead we weigh them by how many points are in the split. This way, in our pathological case, the single example would only receive a weight of 1/n, and not contribute much to the overall impurity of the split. I hope this answers your question.

  • @srisaisubramanyamdavanam9912
    @srisaisubramanyamdavanam9912 3 місяці тому

    compression comparison for cross entropy is just damn good....

  • @mohajeramir
    @mohajeramir 4 роки тому +1

    This was amazing. Thank you very much

  • @utkarshtrehan9128
    @utkarshtrehan9128 4 роки тому +1

    Machine Learning ~ Compression 💡

  • @Charby0SS
    @Charby0SS 4 роки тому +1

    Would it be possible to split using something similar to Gaussian processes instead of the brute force method?
    Great lecture btw :)

  • @hohinng8644
    @hohinng8644 2 роки тому

    28:24 this sound like a horror movie lol

  • @filipgaming1233
    @filipgaming1233 4 місяці тому

    wonderful lecture

  • @michaelmellinger2324
    @michaelmellinger2324 2 роки тому

    Decision trees are horrible. However, once you address the variance with bagging and the bias with boosting, they become amazing. @12:50

  • @yunpengtai2595
    @yunpengtai2595 3 роки тому

    I have some problems about regression.I wonder if I can discuss them with you.

  • @zaidamvs4905
    @zaidamvs4905 11 місяців тому

    i have a question how we know the best sequence of features that we should use in each depth layer because if we want to try each one and optimize with 30 to 40 features will take forever , or how we can do this for m features because i can really visual how this work.

  • @dominikmazur4196
    @dominikmazur4196 Рік тому

    Thank you 🙏

  • @mathedelic5778
    @mathedelic5778 6 років тому +2

    Sehr gut!

  • @gregmakov2680
    @gregmakov2680 2 роки тому

    giang bai ma long ghep tum lum het nha :D:D:D met ca nup lum bat bo tu bi gio :D:D:D thay hu qua diiii

  • @KW-md1bq
    @KW-md1bq 4 роки тому +2

    Should probably have mentioned the log used in Information Gain is Base 2.

  • @prabhatkumarsingh8668
    @prabhatkumarsingh8668 4 роки тому

    The formula shown for Gini impurity is applied on the leaf node right? The Gini impurity for the attribute is the weighted value..?

    • @kilianweinberger698
      @kilianweinberger698  4 роки тому +3

      Essentially you compute the weighted Gini impurity for each attribute, for each possible split.

  • @vocabularybytesbypriyankgo1558
    @vocabularybytesbypriyankgo1558 3 місяці тому

    Thanks !!!

  • @shaywilliams629
    @shaywilliams629 3 роки тому

    Forgive me if I'm wrong but if a pure leaf node with 3 classes that results in P1=1, P2=0, P3=0, the sum of Pk*log(Pk) would be 0, so the idea would be to minimize from the positive entropy equation?

  • @yashwardhanchaudhuri6966
    @yashwardhanchaudhuri6966 2 роки тому

    Hi can anyone please explain why equally likely events are a problem in decision trees? What I understood from it was that the model will need to be very comprehensive to tackle such cases but I am unsure of my insight.

    • @yashwardhanchaudhuri6966
      @yashwardhanchaudhuri6966 2 роки тому

      Okay so what I understood is that a leaf node cannot have confusion. So if a node is a leaf node then it should have all positive or all negatives but not a mix of both which would happen if we stop a tree in making early right?

  • @KaushalKishoreTiwari
    @KaushalKishoreTiwari 3 роки тому +1

    Pk is zero means k is infinity how it is possible, Q at 39.00

    • @kilianweinberger698
      @kilianweinberger698  3 роки тому

      Oh, no. p_k is not 1/k. We are computing the divergence between p_k and 1/k. p_k is the fraction of elements of class k in that particular node, so p_k=0 if there are no elements of class k in that node.

  • @usamajaved7055
    @usamajaved7055 Рік тому

    Please share past papers of ml

  • @SanjaySingh-ce6mp
    @SanjaySingh-ce6mp 4 роки тому

    isn't log(a/b)=log(a)-log(b) ?? at 30:35

    • @kilianweinberger698
      @kilianweinberger698  4 роки тому +1

      yes, but here we have log(a/(1/b))=log(a)-log(1/b)=log(a)+log(b).

    • @SanjaySingh-ce6mp
      @SanjaySingh-ce6mp 4 роки тому

      @@kilianweinberger698 thank u,i got it now🙏

  • @elsmith1237
    @elsmith1237 5 років тому

    What's a Katie tree?

    • @kilianweinberger698
      @kilianweinberger698  5 років тому +5

      Actually, it is called KD-Tree. A description is here: en.wikipedia.org/wiki/K-d_tree Essentially you recursively split the data set along a single feature to speed up nearest neighbor search.
      Here is also a link to the notes on KD-Trees: www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote16.html