Tutorial 38- Decision Tree Information Gain

Поділитися
Вставка
  • Опубліковано 31 січ 2025

КОМЕНТАРІ • 213

  • @davidbatista3457
    @davidbatista3457 4 роки тому +113

    Bro you are a legend and a half...Prof spent 3 weeks on this and your 12 minute video just explained this beautifully.

    • @Sss-kj1ev
      @Sss-kj1ev 3 роки тому

      Tera matlab Savva Legend

    • @raj-nq8ke
      @raj-nq8ke 3 роки тому

      You guys doing Data Science Degree course From Colleges.

    • @Sss-kj1ev
      @Sss-kj1ev 3 роки тому

      @@raj-nq8ke no

    • @davidbatista3457
      @davidbatista3457 3 роки тому

      @@Sss-kj1ev computer engineering!

    • @hemantborse8923
      @hemantborse8923 Рік тому

      @@Sss-kj1ev dedh legend (shana)

  • @alish7872
    @alish7872 3 роки тому +7

    man i truly appreciate you more than any of my college doctors , hope you achieve all ur dreams

  • @ankitchoudhary5585
    @ankitchoudhary5585 3 роки тому +3

    I am seeing his videos from my last 2+ years (from college days)...proud see this community and growth he made..good luck mate.

  • @Cricketpracticevideoarchive
    @Cricketpracticevideoarchive 4 роки тому +19

    6:36 information gain starts

  • @lahiru954
    @lahiru954 3 роки тому +1

    Great explanation. This is the best channel to becoming a perfect data scientist.

  • @rccgcol9404
    @rccgcol9404 3 роки тому +1

    Thank You so much for this. A lot of confusion in the textbook but the two videos on both entropy and information gain is a lifesaver! You are the best. Just Subscribed

  • @tanmayvaidya8337
    @tanmayvaidya8337 4 роки тому

    Actual working mechanism behind decision tree algorithm is clearly explained. Thanks for uploading !

  • @vishalaaa1
    @vishalaaa1 4 роки тому

    Naik. Your trainings are very clear and smart.
    50% of the professional sare from r back ground. Can you keep r codes also in all videos. Ordering and appropriate long titling is an excellent way.
    One tip. Please include data types in every discussion as many functions are sensitive to data types. This gives an option to give information of alternatives.
    Ex: K-Mean clustering, In 99% of articles no one explains how to do this on categorical data.
    R code will double the no of hits.
    I wish that you will eventually start your own consulting company on data science

  • @ukc2704
    @ukc2704 4 роки тому +5

    your teaching skill is amazing ,

  • @Relaxing_Music-w6e
    @Relaxing_Music-w6e 10 місяців тому

    LOVE YOU I WATCHED YOUR ENTROPY VIDEO AND NOW THIS. ITS SOOO HELPFUL FOR ME SINCE I WANT TO BE A DATA SCINETIST

  • @kaisersayed9974
    @kaisersayed9974 4 роки тому +7

    Sir please make more videos on web scrapping from scratch for beginners I am not from cs background..but for data science I think we should have knowledge of web scrapping...sir please make videi on this topic. You are a great teacher and you are a roll model for me Thank you sir.

    • @arpitcruz
      @arpitcruz 4 роки тому +2

      Join as member.. there is one end to end project related to ML

  • @ashwinideshmukh4920
    @ashwinideshmukh4920 3 роки тому +2

    Hats off to you man!! Your teaching skills are amazing.

  • @RishabhBisht1997
    @RishabhBisht1997 3 роки тому

    Better explanation because of the numerical used. Absolutely Beautiful. Superlike

  • @stevemungai3542
    @stevemungai3542 2 роки тому

    These are the only tutorials I watch and understand the first time

  • @TheAwesomejay
    @TheAwesomejay 3 роки тому +1

    I love the way you explain things. Very clear and easy to digest

  • @rashmidhawan4783
    @rashmidhawan4783 2 роки тому

    Hats off.... No match of you exist ...excellent

  • @maseedilyas203
    @maseedilyas203 4 роки тому

    best tutorial on ml i could find .Thank you very much krish sir. God bless you

  • @spider279
    @spider279 2 роки тому

    standing ovation for his nice explanation , thanks you very much guy you're so kind

  • @sandipansarkar9211
    @sandipansarkar9211 4 роки тому +1

    Thanks Krish. There is a slight confusion between entropy and information gain.I am sure it will be clarified in the process.

  • @sumankar06
    @sumankar06 6 місяців тому

    deepest respect, fantastic explanation

  • @BytemeMaybe
    @BytemeMaybe 4 роки тому

    best explanation in entire youtube videos!

  • @bhargavasavi
    @bhargavasavi 4 роки тому +3

    Hi Krish, I want to mention a small correction here .... f1 has 9Y|5N, so f2 's Y and N should sum up to 9 and also f3's Y and N should sum up to 5...But they are different in both your examples :) ......but other than this your lucid explaination of concepts is quite amazing. You rock !

    • @bhargavasavi
      @bhargavasavi 4 роки тому +1

      I will take that back...9Y|5N is for labels....f1 has 2 categories , c1 and c2....c1 is 6Y|2N is c2 is 3Y|3N...incase of numerical feautures, we select a threshold to treat the values as categories, and since we get as many thresholds as number of values, it is computationally expensive.....Completely understood the concept now.

  • @arnabdutta404
    @arnabdutta404 4 роки тому +4

    in place of f2 and f3, there should actually be the values of the feature f1 (say, yes or no / high or low). after the split we will get entropy at individual points and can calculate IG for f1. similarly IG for f2, f3 etc can be found. highest IG feature to be chosen. then go ahead with further split.
    @krish pl correct me if i am wrong

    • @sunnyluvu1
      @sunnyluvu1 4 роки тому +1

      Yes u r right

    • @marijatosic217
      @marijatosic217 3 роки тому +1

      Exactly! :) Also, after Information gain we should do Intrinsic value and then Gain ratio, which is our final result.

  • @neprobos3246
    @neprobos3246 4 роки тому

    Thankyou Krish, you explain everything in detail ! No words to thaankyou

  • @Lucia-el6ex
    @Lucia-el6ex 2 роки тому +1

    Gracias por tus explicaciones, eres increíble! Nos ayudas muchísimo!

  • @wealth_developer_researcher
    @wealth_developer_researcher 3 роки тому

    Got the right channel to learn machine learning :) . Thanks Bro.

  • @runnsingha
    @runnsingha 3 роки тому

    krish.... you are a true gem

  • @ShiVa-jy5ly
    @ShiVa-jy5ly 4 роки тому +1

    Thanks sir,all sessions are very informative.

  • @adityavipradas3252
    @adityavipradas3252 4 роки тому +1

    Really appreciate your effort and the videos. Thank you very much Krish.

  • @vishaljhaveri7565
    @vishaljhaveri7565 3 роки тому

    Thank you, Krish sir. Nice video.

  • @jovinol
    @jovinol 4 роки тому +1

    I can see his passion for machine learning

  • @raviirla459
    @raviirla459 4 роки тому +16

    Krish, can you please do vedios on time series analysis please...

  • @anilmurmu6675
    @anilmurmu6675 2 роки тому +1

    Key takeaway is :
    if Entropy is closer to 1 or equal to 1 then it is more impure.
    if Entropy is 0 then it is considered to be leaf node or it is a pure split.
    Decision tree classification model will be chosen based on Information Gain with highest value.

  • @abhisheksurya5790
    @abhisheksurya5790 4 роки тому +2

    As per my understanding the entire data is present in root node.And the split happens based on features/column values with respect to target variable by computing entropy and information gain.If one column/one feature is used at one split that feature/column wont be used for further splitting.Correct me if i'm wrong.

  • @bishwajeetsingh8834
    @bishwajeetsingh8834 2 роки тому

    So clearly you made it. Thankyou

  • @lokesh542
    @lokesh542 4 роки тому +5

    Simply amazing loved the way you explained the concept it was really easy to understand

  • @avinashpandey6518
    @avinashpandey6518 3 роки тому

    Thank you sir very helpful video for me. I will definately share with my friends as well

  • @DS_AIML
    @DS_AIML 4 роки тому +1

    It means to calculate information gain various structures of trees will be created.And structure with highest information gain will be taken for Decision Tree training.How it will calculate,how many structures of trees to consider to create the information gain.Basically how many combinations of trees it will create.What is decision criteria for same ?

    • @sauravmukherjeecom
      @sauravmukherjeecom 4 роки тому

      IG is calculated at each feture level while constructing the tree.
      Only a single tree is created.

  • @sugata83
    @sugata83 4 роки тому

    superb video and very nicely described..thanks Krish

  • @michaelkareev2046
    @michaelkareev2046 4 роки тому +5

    H(s) = H(f_1) = 0.94. While in the gain formula (black marker) it's 0.91

    • @rahulbagal6741
      @rahulbagal6741 3 роки тому

      yes its a mistake but h(s) is 0.94

    • @shimonadandona
      @shimonadandona Рік тому

      calculate average of all the entropies for f1, f2 and f3

    • @naveenreddyindluru7982
      @naveenreddyindluru7982 4 місяці тому

      Yes, it is a mistake, but there will be no much difference in the fraction values. So, no need to worry.

  • @soumyaiter1
    @soumyaiter1 4 роки тому +3

    Information gain is entropy of parent - weighted entropy of child

  • @TheLtricky
    @TheLtricky 2 роки тому

    Amazing! You explain everything so well! Thank you!

  • @mellowftw
    @mellowftw 3 роки тому

    You're no less than andrew ng for me.. respect++

  • @dhristovaddx
    @dhristovaddx 4 роки тому

    Very clear explanation. Great job on this video!

  • @Mahmoud-ys1kt
    @Mahmoud-ys1kt Рік тому

    Great efforts , Thanks a lot

  • @auroshisray9140
    @auroshisray9140 3 роки тому

    Beautifully explained....thank you sir!!

  • @diegobarrientos6271
    @diegobarrientos6271 2 роки тому

    Thanks a lot!, very clear explanation

  • @yashodhansatellite1
    @yashodhansatellite1 4 роки тому +1

    You are awesome Krish

  • @hanman5195
    @hanman5195 4 роки тому +1

    Ultimate Video once again Sir

  • @robertdreyfus5436
    @robertdreyfus5436 2 місяці тому

    Python script for the calculation (will work for any tree, not just binary):
    import math
    def entropy(set):
    components = [-x*math.log2(x/sum(set))/sum(set) if x > 0 else 0 for x in set]
    return sum(components)
    def gain(*subsets):
    sample = list(map(sum, zip(*subsets)))
    proportional_entropy = [entropy(subset)*sum(subset)/sum(sample) for subset in subsets]
    return entropy(sample), proportional_entropy, entropy(sample)-sum(proportional_entropy)
    print(gain([6,2],[3,3]))

  • @LynetHannahThomas
    @LynetHannahThomas 4 роки тому

    Good explanation,want more videos on machine Learning, thank you so much krish

  • @mohittahilramani9956
    @mohittahilramani9956 2 роки тому

    Sir u are a life saver ❤

  • @deltechdiaries5907
    @deltechdiaries5907 3 роки тому

    very nice explanation sir!

  • @priyasri4398
    @priyasri4398 4 роки тому

    Your just awesome in teaching online

  • @kushalhu7189
    @kushalhu7189 3 роки тому

    As Always you are the best

  • @kishore3785
    @kishore3785 3 роки тому

    excellent Explanation Sir

  • @ramarajudatla229
    @ramarajudatla229 4 роки тому +1

    thanks for nice explanation

  • @jaydipnigul3159
    @jaydipnigul3159 4 роки тому

    Very well explained ,really helpful .🤗

  • @143balug
    @143balug 4 роки тому +1

    Simply say Superb, thank you

  • @jaianushanu
    @jaianushanu 3 роки тому +1

    First of all a big thanks to you as you have made learning very easy and interesting :),,,,if the information gain on one leaf node calculated as 1 and in the other leaf node calculated as 0.4(any value less then 1) then which leaf node to be considered?

    • @HShravzP
      @HShravzP 3 роки тому

      based on what he said in video from 4:22 to 5:06 , I think the leaf node with 0.4 is better to consider

    • @parveenparveen9384
      @parveenparveen9384 3 роки тому +1

      @@HShravzP , he said about entropy during this duration but question was asked on Information gain. We have to select the leaf node with higher Information gain value.

    • @HShravzP
      @HShravzP 3 роки тому

      @@parveenparveen9384 okay understood👍

    • @ankitchoudhary5585
      @ankitchoudhary5585 3 роки тому

      Leaf node do not have information gain...leaf node have entropy and the node just before the leaf node have information gain for the split..if the node is going for further split ,the split will be evaluated on the basis of information gain from its child nodes and if its child node is also going to split it will also be evaluated on the basis of information gain from the child's child node...and this process repeats until we get a node which will not need further split(leaf node)...because this greed of achieving purity by splitting the nodes trees always prone to overfit on the training data thats why we use different parameters to control the growth of the tree to stop overfitting and get better generalized tree model for unseen data.

  • @chandrasekhargogula7991
    @chandrasekhargogula7991 3 роки тому

    Here H(S) about target variable and then take the difference for each average entropy value of Feature from Entropy of Target to see where to split..

  • @mahroozekiani6392
    @mahroozekiani6392 4 роки тому

    this video was very useful. But please solve a question with attributes.

  • @ompatil3620
    @ompatil3620 6 місяців тому

    nice explanation
    sir

  • @mukulsharma9673
    @mukulsharma9673 4 роки тому +2

    You said that 0 entropy is the worst and Information Gain is actually finding the avg of the Entropy of the whole structure, then according to ur definition the lesser the information gain the better is should be
    but at the end u said the more the IG the better.
    You hv contradicted you statements.... Yes explain that correctly

    • @sauravmukherjeecom
      @sauravmukherjeecom 4 роки тому +1

      0 entropy is the best. In any scenario we want to minimize the entropy.
      At each level we want to maximize the information gain because that would lead us the fastest way from the high entropy we have right now to the low entropy we want to get to in the future,

  • @sarabjeetsingh5033
    @sarabjeetsingh5033 4 роки тому +2

    Hi Krish, it would be more precise to use probability of + than percentage of +.

  • @resurrectingsynergyproduct756
    @resurrectingsynergyproduct756 3 роки тому

    Great explanation

  • @ShivamKumar-em9nr
    @ShivamKumar-em9nr 4 роки тому +1

    deserves million views

  • @patrickbateman7665
    @patrickbateman7665 4 роки тому +1

    Thanks Alot Krish :)

  • @brave_v
    @brave_v 2 роки тому

    Thank you for the video Krish! When RF uses Gini Index, is it just supplementing H(S) in the information gain formula with GI? In other words, does the information gain concept still applies when using GIni Index?

  • @Letsbetog
    @Letsbetog 3 роки тому

    Great sir

  • @mbhazimangoveni3068
    @mbhazimangoveni3068 Рік тому

    Man, you rock!

  • @louerleseigneur4532
    @louerleseigneur4532 3 роки тому

    Thanks Krish

  • @phoebemagdy1554
    @phoebemagdy1554 2 роки тому

    Thanks for the highly informative tutorials,
    My question to you : Is there is any option in Decisiontreeclassifier in sklearn to make a node split into three child nodes when the feature used in splitting is categorically coded as (0,1,2) for each of the three categories?

  • @joannewardell8396
    @joannewardell8396 2 роки тому

    How do we determine the leaf nodes? Better, how do we determine where to put the labels?

  • @shubhamgosavi6703
    @shubhamgosavi6703 4 роки тому

    great understanding,Than you

  • @chaitanyasonavane3871
    @chaitanyasonavane3871 4 роки тому

    Thankyou so much sir, helped alot

  • @shindepratibha31
    @shindepratibha31 4 роки тому

    I think f1 is divided into 8yes/6no.
    Another thing, initially we took f1 as root node and divided into f2 and f3. If this split gives highest information gain, then we will proceed for the next split of f2 and f3. Similarly, information gain will be calculated for next split for f2 and f3 by treating f2 and f3 as root nodes and the process goes on till we reach leaf nodes. Is this understanding correct? Please reply and correct me if I am wrong.

    • @sauravmukherjeecom
      @sauravmukherjeecom 4 роки тому

      Yes it is correct.
      Information gain is calculated at each feature level.

  • @dianafarhat9479
    @dianafarhat9479 10 місяців тому

    Thank you!

  • @sarahschlund4750
    @sarahschlund4750 3 роки тому

    You are amazing!

  • @arpit8273
    @arpit8273 4 роки тому

    great video. Keep going.

  • @codingworld6151
    @codingworld6151 2 роки тому

    Kindly create playlist on computer vision

  • @mdiftekhar6876
    @mdiftekhar6876 3 роки тому

    Thank you so much

  • @naveenchauhanindian
    @naveenchauhanindian 4 роки тому

    Hi krish, it is very helpful to understand the famous paper "Induction of Decision Trees , J.R. Quinlan". one question in my mind, do we need to covert the features into qualitative values. if yes, than we need to generate the clusters too. If Im right, then how to decide the no. of clusters. because my data is purely quantitative in nature.

    • @sauravmukherjeecom
      @sauravmukherjeecom 4 роки тому

      You do not need to convert the features into numerical values if you are doing a classification problem. However, you will need to convert them into some sort of continuous values in regression.
      Features will just be the nodes here on which we check the class of the samples. You do not need to transform it if you are just considering classification

  • @sonamkori8169
    @sonamkori8169 4 роки тому

    Thank you 😊

  • @sahebganguly4867
    @sahebganguly4867 4 роки тому +2

    sir, h(s) value was 0.94 but why there is 0.91 in the formula

    • @akjcool
      @akjcool 4 роки тому

      i think he mistakenly wrote that. it should be .94

  • @Devpatel-oi1er
    @Devpatel-oi1er 2 роки тому +1

    krish why are you not applied decision tree practically on python

  • @arpitcruz
    @arpitcruz 4 роки тому +4

    Sir complete the RNN playlist please

    • @krishnaik06
      @krishnaik06  4 роки тому +3

      Hi @Arpit next video is RNN only

  • @originalgamer4962
    @originalgamer4962 3 роки тому

    but why would we start our tree from f2 when we know the entropy of f1 is smaller than the entropy of f2 ?

  • @adiflorense1477
    @adiflorense1477 3 роки тому

    2:58 Sir Krish, does the symbol P mean probability?

  • @andrewwilliam2209
    @andrewwilliam2209 4 роки тому

    Hey Krish, for the information gain, will they count of all the subsets until the leaf node? let's say over here we want to find information gain for the f1, but the f2 splits further into f4 f5, then will the information gain be calculated based on f2 f3 only, or will it go for f4 f5 f3?

  • @sreelalb1729
    @sreelalb1729 4 роки тому +4

    Hi, Thanks for the video. While explaining entropy in the beginning section, you said P+ and P- are percentage of positive and negative values respectively. Is that correct, should it be defined as probability of positive and negative values rather than saying percentage?

  • @143balug
    @143balug 4 роки тому +1

    Krish, Can you make some videos on PyTorch

  • @tanwar_rahul19
    @tanwar_rahul19 4 роки тому +2

    Hey Krish , I have done pandas and matplotlib and seaborn what will be next please help I am confused.

    • @krishnaik06
      @krishnaik06  4 роки тому +2

      Follow the playlist given in the link

  • @adiflorense1477
    @adiflorense1477 3 роки тому

    6:54 Sir, what is the difference between the entropy of the class computed at the initial separation and the entropy of each attribute?

    • @mohammedameen3249
      @mohammedameen3249 3 роки тому

      They are same

    • @adiflorense1477
      @adiflorense1477 3 роки тому

      @@mohammedameen3249 Nope. I thought entropy class is entropy before split, entropy each attribute is entropy after split

  • @help2office427
    @help2office427 3 роки тому

    P+ is not percentage of yes numbers, p+ is fraction of yes numbers

  • @rajkumarbatchu19
    @rajkumarbatchu19 4 роки тому

    hi krish, your videos are good, can you please make a videos on different feature selection methods i.e, filter, wrapper and embedded methods together in detail. thanks in advance

  • @shanbhag003
    @shanbhag003 3 роки тому

    Does Decision Tree use "one vs rest" mechanism for calculating entropy in multi class classification ?

  • @robertelizondo7841
    @robertelizondo7841 3 роки тому

    The goat fr

  • @akshithareddy5491
    @akshithareddy5491 3 роки тому

    Plzz show implementation of decision tree for any dataset

  • @Brownchickonyoutube
    @Brownchickonyoutube 4 роки тому +3

    Hello Sir, could you please do a project using logistic regression with strings?

    • @ankitchoudhary5585
      @ankitchoudhary5585 3 роки тому

      By strings did you mean the categorical data?
      Example of Categorical columns/features-
      Gender(male,female,trans,null),
      maritalstatus(married,notmarried,divorced,widowed,null)
      If yes, then use encoding techniques to convert categorical variables into integers.
      There are various techniques and every technique has its own explanation and criteria for its use..figure it out what will be good fit for your case.