Machine Learning-Bias And Variance In Depth Intuition| Overfitting Underfitting

Поділитися
Вставка
  • Опубліковано 5 жов 2024

КОМЕНТАРІ • 456

  • @heartsbrohi9394
    @heartsbrohi9394 4 роки тому +189

    Good turorial. My thoughts below (hope it adds to someone's understanding):
    We perform cross validation (to make sure that model has good accuracy rate and it can be used for prediction using unseen/new or test data). To do so, we use train and test data by properly splitting our dataset for example 80% for training, 20% for testing the model. This can be performed using train_test, train_test_split or K-fold (K-fold is mostly used to avoid under and overfiting problems).
    A model is considered as a good model when it gives high accuracy using training as well as testing data. Good accuracy on test data means, model will have good accuracy when it is trying to make predictions on new or unseen data for example, using the data which is not included in the training set.
    Good accuracy also means that the value predicted by the model will be very much close to the actual value.
    Bias will be low and variance will be high when model performs well on the training data but performs bad or poorly on the test data. High variance means the model cannot generalize to new or unseen data. (This is the case of overfiting)
    If the model performs poorly (means less accurate and cannot generalize) on both training data and test data, it means it has high bias and high variance. (This is the case of underfiting)
    If model performs well on both test and training data. Performs well meaning, predictions are close to actual values for unseens data so accuracy will be high. In this case, bias will be low and variance will also be low.
    The best model must have low bias (low error rate on training data) and low variance (can generalize and has low error rate on new or test data).
    (This is the case for best fit model) so always have low bias and low variance for your models.

    • @karthikparanthaman634
      @karthikparanthaman634 4 роки тому +2

      Wonderful summary!

    • @farhathabduljabbar9879
      @farhathabduljabbar9879 3 роки тому +8

      You should probably create articles coz you are good at summarising concepts!
      If you have one please do share!

    • @adegboyemoshood809
      @adegboyemoshood809 3 роки тому

      Great

    • @roshanid6523
      @roshanid6523 3 роки тому +1

      Very well written 👍🏻
      Thanks for sharing
      👍🏻 Consider writing blogs

    • @AnandKumar-to6ez
      @AnandKumar-to6ez 3 роки тому +1

      Really very nice and well written. After watching video, if we go through your summery, its a stamp on our brains. Thanks to both for your efforts.

  • @sandipansarkar9211
    @sandipansarkar9211 3 роки тому +225

    This video need to be watched again and again.Machine learning is nothing but proper understanding of ovrfitting and underfitting..Watching the second time.Thanks Krish

    • @adipurnomo5683
      @adipurnomo5683 3 роки тому +4

      Ageeed!

    • @batman9937
      @batman9937 3 роки тому +7

      This is what they asked me in OLA interview. And the interviewer covered great depth on this topic only. It's pretty fundamental to ML. Sad to report they rejected me though.

    • @ashishbomble8547
      @ashishbomble8547 2 роки тому

      @@batman9937 hi man plz help to know what other questions they asked .

    • @carti8778
      @carti8778 2 роки тому +2

      @@ashishbomble8547 buy the book :: ace the data science interview by Kevin Huo and nick singh .

  • @rafibasha1840
    @rafibasha1840 3 роки тому +38

    Hi Krish,thanks for the explanation ..6:02 it should be high bias and low variance in case of under fitting

    • @mohitzen
      @mohitzen 2 роки тому +4

      Yes exactly i was looking for this comment

    • @singhvianuj
      @singhvianuj 2 роки тому

      Amazing video by Krish. Thanks for pointing out this. @Krish Naik please make a note of this

    • @shailzasharma5619
      @shailzasharma5619 2 роки тому

      yess!!!

    • @rohitkumar-gi8bo
      @rohitkumar-gi8bo Рік тому

      yess

    • @ramyabc
      @ramyabc Рік тому

      Exactly! I searched for this comment :)

  • @emamulmursalin9181
    @emamulmursalin9181 3 роки тому +32

    At 06:08 it is said that the underfitted data, the model has high bias and high variability. To my understanding, the information is not correct.
    Variance is the complexity of a model that can capture the internal distribution of the data points in the training set. When variance is high, the model will be fitted to most (even all) of the traiining data points. It will result in high training accruacy and low test accuracy.
    So in summary :
    When the model is overfitted : Low bias and high variance
    When the model is underfitted :High bias and Low variance
    Bias : The INABILITY of the model to be fit on the training data
    Variance : The complexity of the model which helps the model to fit with the training data.

    • @tejanarora1221
      @tejanarora1221 2 роки тому +2

      yes bro, you are correct

    • @rohitkumarchoudhary
      @rohitkumarchoudhary 2 роки тому

      I also have same doubt. @Krish Naik sir , please have a look on it.

    • @swapnilparle1391
      @swapnilparle1391 2 роки тому +1

      But under fitting suppose to have low accuracy of training data know ? Confusing !!

    • @prachinainawa3055
      @prachinainawa3055 2 роки тому +1

      Have I learned the wrong definition of bias and variance by krish sir's explanation? Now I am confused😑

    • @swapnilparle1391
      @swapnilparle1391 2 роки тому

      @prachi... not at all concept is at the end same

  • @westrahman
    @westrahman 4 роки тому +22

    XGBoost, the answer cant be simple, but what happens is when dealing with high bias, do better feature engineering n decrease regularization, so in XGBoost we increase depth of each tree and other techniques to handle it to minimize the loss...so you can come to conclusion that if proper parameters are defined (including regularization etc) it ll yield low bias and low variance

  • @shashankverma4044
    @shashankverma4044 4 роки тому +15

    This was my biggest doubt and you clarified it in so easy terms. Thank you so much Krish.

  • @vatsal_gamit
    @vatsal_gamit 4 роки тому +18

    at 6:10 you made it all clear to me in just 2 lines!! Thank you for this video :)

  • @chimadivine7715
    @chimadivine7715 19 днів тому

    Krish, your videos hit the nail on the head. You explained the meaning of bias and variance. Thanks a lot!

  • @ellentuane4068
    @ellentuane4068 3 роки тому +8

    Can't express my gratitude enough ! Thank you for explaining it so well

  • @DS_AIML
    @DS_AIML 4 роки тому +49

    Underfitting : High Bias and Low Variance
    OverFitting : Low Bias and High Variance
    and Generalized Model : Low Bias & Low Variance.
    Bias : Error from Training Data
    Variance : Error from Testing Data
    @Krish Please confirm

    • @videoinfluencers3415
      @videoinfluencers3415 4 роки тому +1

      I am confused ...
      It means that underfitted model has high accuracy on testing data?

    • @akashhajare223
      @akashhajare223 4 роки тому +8

      Underfitting : High Bias and HIGH Variance

    • @kiran082
      @kiran082 4 роки тому +4

      @@videoinfluencers3415 I mean under fitting model has low accuracy on Testing and Training Data and the difference between the Training accuracy and test accuracy is very less, that's why we get low variance and high biased in Under fitting models.

    • @hiteshchandra156
      @hiteshchandra156 4 роки тому +1

      You are correct bro I checked on Wikipedia also..and in some different sources too.
      @Krish Please Confirm.

    • @sindhuorigins
      @sindhuorigins 4 роки тому +2

      If it makes it any clear for other learners, here's my explanation...
      BIAS is the simplifying assumptions made by a model to make the target function (the underlying function that the ML model is trying to learn) easier to learn.
      VARIANCE refers to the changes to the estimate of the target function that occur if the dataset is changed when implementing the model.
      Considering the linear model in the example, it makes an assumption that the input and output are related linearly causing the target function to underfit and hence giving HIGH BIAS ERROR.
      But the same model when used with similar test data, will give quite similar results and hence giving LOW VARIANCE ERROR.
      I hope this clears the doubt.

  • @ravibhat2849
    @ravibhat2849 4 роки тому +21

    Beautifully explained.
    But in underfitting, model shows High Bias and Low variance instead of high variance.

    • @krishnaik06
      @krishnaik06  4 роки тому +17

      Yes u r right...made a minor mistake

    • @namansinghal3685
      @namansinghal3685 4 роки тому

      @@krishnaik06 But then sir you said Bias is error and in underfitting training data error is low.. so should it be low bias?

    • @ravibhat2849
      @ravibhat2849 4 роки тому

      @@namansinghal3685 when data has high bias, it misses out on certain observations.. So the model will be underfit..

    • @jitenderthakur697
      @jitenderthakur697 4 роки тому

      @@namansinghal3685 in case of underfitting training error is high..not low

    • @gourav.barkle
      @gourav.barkle 3 роки тому +1

      @@krishnaik06 You should pin this comment

  • @Hitesh-Salgotra
    @Hitesh-Salgotra 3 роки тому +4

    krish sir i hope God bless you with whole heart you are doing great job and thanks for the INEURON it made my life easy.

  • @vannelamallesh7635
    @vannelamallesh7635 13 годин тому

    Thank you very much sir fir your clear explaination on bias variance underftting and over fitting on many parameters

  • @MrAli2200
    @MrAli2200 3 роки тому +1

    You can't get a clearer explanation than this, hats off mate

  • @kaushalpatwardhan
    @kaushalpatwardhan 2 роки тому +2

    I have been trying to understand this concept since long ... But never knew its this simple 😀 thank u Krish for this amazingly simple explanation to understand.

  • @garath
    @garath 3 роки тому +8

    Very thorough and good explanation! Thank you.
    Side note: Would like to point out that 2:12 the degree of polynomial is still 2 (its still a quadratic function).

  • @dishashaktawat2219
    @dishashaktawat2219 3 роки тому

    providing these info makes you a great teacher... the way you explain everything going to brain.....

  • @chandrachalla3466
    @chandrachalla3466 2 роки тому +1

    This is an awesome video - was fully confused earlier - this video made it all clear !! Thanks a lot sir !!

  • @bauyrjanjyenis3045
    @bauyrjanjyenis3045 2 роки тому +1

    Very succinct explanation of the very fundamental ML concept. Thank you for the video!

  • @vinayakbachal8134
    @vinayakbachal8134 3 роки тому +2

    bhai, tu bahot sahi hai, 2.80 lacs fees bharke jo baat nahi samzi easily wo tumne 16 minutes me bata di..kudos..amazing word dear, all the very best

  • @YashSharma-es3lr
    @YashSharma-es3lr 3 роки тому

    sir after watching this video , mera confusion ek baar mein clear ho gya between bias and variance , awsome explaination

  • @kajalkush1210
    @kajalkush1210 2 роки тому +1

    Way of explanation is woww.

  • @sauravb007
    @sauravb007 4 роки тому +40

    XGBoost should have low bias & low variance !

    • @brahmihassane8499
      @brahmihassane8499 3 роки тому

      Not really it will depend how do you tune the hyperparameters of the model, for this reason it is important to tune a model in order to find a compromise that ensure a low biais (capacity of the model to fit a theoritical function) and low variance (capacity of model to generalisation)

  • @cheeku5568
    @cheeku5568 4 роки тому +2

    One video all clear content... thanks bro it was really a nice session.. u really belong to low bias n low variance human. Keep posting such clear ML videos..

  • @milanbhor2327
    @milanbhor2327 4 місяці тому

    The most clear and precise information 🎉 thank you sir❤

  • @satishbanka
    @satishbanka 3 роки тому

    One of the best explanations of Bias and Varianace w.r.t Overitting and underfitting...

  • @khursiahzainalmokhtar225
    @khursiahzainalmokhtar225 Рік тому +1

    What an excellent explanation on bias and variance. I finally understood both terms. Thank you so much for the video and keep up the good work!

  • @72akshayvasala59
    @72akshayvasala59 3 роки тому

    U are Reallly great sir ... ur explanation is very much Crystal Clear

  • @delwarhossain43
    @delwarhossain43 4 роки тому +1

    Very important discussion on important words in ML. Thanks. Easy explanation on hard words.

  • @kanhataak1269
    @kanhataak1269 4 роки тому +1

    After watching this video doubt is clear really helping this. And Thanks given ur precious time...

  • @yashkhandelwal6558
    @yashkhandelwal6558 3 роки тому +1

    Great I learnt by watching your entire playlist.

  • @bakerkar
    @bakerkar Рік тому

    excellent tutorial. better than IIT professors who r teaching machine learning.

  • @eswaranthanabslan2607
    @eswaranthanabslan2607 3 роки тому

    Krish, you are a master in statistics and machine learning

  • @nevillesantamaria7487
    @nevillesantamaria7487 3 роки тому

    Beautifully explained. My concept are now clear on Over fitting and Under fitting models. 👍 Thanks 🍻

  • @aziausa
    @aziausa 3 роки тому +4

    You have God-gifted talent to teach. You are a gem!!!!

    • @dudefromsa
      @dudefromsa 3 роки тому

      I agree with your sentiment. He has such understanding to break down concept in a coprehensive manner

  • @kumarssss5
    @kumarssss5 4 роки тому +2

    Excellent teaching

  • @lekshmigs123
    @lekshmigs123 2 роки тому

    Thank you very much for the simple and proper explanation...

  • @rajeshrajie1237
    @rajeshrajie1237 4 роки тому +1

    Bias is an error on training data ,
    variance is an error on test data. Thanks for simplifying

  • @pavithrad9543
    @pavithrad9543 3 роки тому

    Krish thankyou so much., this is the best channel for data science that I ever seen. Great efforts Krish. Thanks again.

  • @jiviteshvarshney3644
    @jiviteshvarshney3644 9 місяців тому +2

    6:00 Small correction in your video.
    Underfitting - High Bias & Low Variance
    Overfitting - Low Bias & High Variance

  • @souravbiswas6892
    @souravbiswas6892 2 роки тому

    XGBoost has the property of low bias and high variance, however it can be regularised and turned into low bias and low variance. Useful video indeed.

  • @The_Pavanputra
    @The_Pavanputra 2 роки тому

    Bias is in training data set and variance is in testing dataset - this line costed me linkedin machine learning job

  • @rachitsingh4913
    @rachitsingh4913 3 роки тому

    This video is great but one thing i want to correct , bias and variance works in inversely proportional manner like if we got high variance , bias will be low or High bias than variance will be low. So in Overfitting its High variance/Low Bias and in Underfitting High Bias/Low variance.
    In order to be best it should be low biased/low variance

  • @NiyantaVlogs
    @NiyantaVlogs 2 роки тому +1

    You explained it so well sir. I was struglling with these terms but after watching your video my concept about bias, variance, underfitting and overfitting is crystal clear. Thank you!

  • @alexsepenu
    @alexsepenu 3 роки тому

    you made my work easy by this explanation. thanks.

  • @avinashankar
    @avinashankar 3 роки тому

    Thanks Krish, had scourged the net, but this understanding was great. Good memory hook! Thanks for this.

  • @abdelwahabfaroug9656
    @abdelwahabfaroug9656 2 роки тому

    very useful lecture , it helps me much to understand this topic in a simple and easy way please keep going

  • @alihamdan425
    @alihamdan425 3 роки тому

    GREAT SIR I GOT IT, THANKS FOR YOUR EFFORT.

  • @shekharpandey9776
    @shekharpandey9776 4 роки тому +3

    Please give a video on some mathematical terminology like gradient descent etc. You are really doing a great job.

  • @sampurngyaan2867
    @sampurngyaan2867 2 роки тому

    It was really good video and it clears all the doubts I have.

  • @k_dash99
    @k_dash99 2 роки тому

    tbh, best video on youtube about Bias And Variance.

  • @hemdatascience369
    @hemdatascience369 2 роки тому

    Today, I got clarity about this Topic, Tq u sir

  • @anoopak007
    @anoopak007 3 роки тому

    Please note that Underfitting occurs when we have HIGH BIAS and LOW VARIANCE.... except that error this video is an excellent one. Thanks.

    • @ramshaazeemi8851
      @ramshaazeemi8851 3 роки тому

      In underfitting, model performs poor on test data as well then why it has low variance. If variance = test error?

    • @anoopak007
      @anoopak007 3 роки тому

      As per my understanding, variance does not actually mean the test error, but the change in test error when the test data is modified. Bcoz in underfitting, the model is too much generalized so that even if we change the test data greatly also, we moreover get the same test error. Somebody correct me if I'm wrong.

  • @aseemjain007
    @aseemjain007 3 місяці тому

    Brilliantly explained !! Thank you !!

  • @IonidisIX
    @IonidisIX 10 місяців тому

    On the last graph you show, Error vs Degree Of Polynomials, you mixed the curves. The red one is for the training dataset whereas the blue is for the test dataset.

  • @mariorozario7649
    @mariorozario7649 2 роки тому

    Thank you so much for clearly explaining this. I have tried so hard to get PhD's to explain this to me .. and never got a clear answer.

  • @gunjalvis
    @gunjalvis 2 роки тому

    ultimate discussion and person who discussed

  • @sabaamin6442
    @sabaamin6442 2 роки тому

    woow awesome, great work done in one single video. insightful

  • @yourtube92
    @yourtube92 Рік тому

    Very good video, easiest video for understanding logic of bias & variance.

  • @sidduhedaginal
    @sidduhedaginal 4 роки тому +1

    This guy is really great...Thank you so much for effort you put for us.

  • @ashisranjanlahiri
    @ashisranjanlahiri 3 роки тому +4

    Hi... your topic explanation is awesome. Just to be curious about, how you are saying bias means training error and variance means test error. Is there any intuitive explanation or mathematical derivation for that?

  • @basavarajag1901
    @basavarajag1901 2 роки тому

    Excellent Explanation.. Krish , in the same video you example of XG boost i.e it model learns from the previous DT and implement the same subsequently.

  • @nahidzeinali1991
    @nahidzeinali1991 Рік тому

    your are so awesome, I love your teaching

  • @robertselemani3775
    @robertselemani3775 2 роки тому

    Well articulated, thank you Krish

  • @shreyasb.s3819
    @shreyasb.s3819 4 роки тому

    Superbbb explained..it connected my dots. Thank u

  • @ManojKumar-wr3os
    @ManojKumar-wr3os 9 місяців тому

    Xgboost will reduce the bias as well as variance by training the subsequent model and by splitting the data. It will help us to reduce the underfitting.

  • @kamaleshp6154
    @kamaleshp6154 2 роки тому

    Best Explanation on Bias and Variance!

  • @nandinisarker6123
    @nandinisarker6123 3 роки тому

    I really was in great need of such an excellent explanation of Bias and variance. great help!

  • @malikumarhassan1
    @malikumarhassan1 2 роки тому

    Perfectly explain sir

  • @yashkhilavdiya5693
    @yashkhilavdiya5693 2 роки тому

    Thank You so much Krish Sir..!!

  • @khmercuber5570
    @khmercuber5570 3 роки тому

    Thanks Mr. Krish for your best explanation, now I can clearly understand about Bias and Variance :D

  • @vipindube5439
    @vipindube5439 4 роки тому +4

    For Xgboost low bias high variance at start in the last it low variance and low bias.(Extreme gradient bossting)

    • @Prajwal_KV
      @Prajwal_KV 4 роки тому

      Then what is the difference between Random forrest and xgboost?what is the need for xgboost?when we can solve the problem using randomforrest?

    • @HARDYBOY290988
      @HARDYBOY290988 3 роки тому +1

      @@Prajwal_KV Regularization is there in XGBOOST

  • @teegnas
    @teegnas 4 роки тому +2

    I really love his in-depth intuition videos ... compared to his plethora of videos!

  • @devasheeshvaid9057
    @devasheeshvaid9057 4 роки тому +4

    Hi @Krish
    I read the following in a resource:
    "Bias refers to the gap between the value predicted by your model and the
    actual value of the data. In the case of high bias, your predictions are likely
    to be skewed in a particular direction away from the actual values.
    Variance
    describes how scattered your predicted values are in relation to each other."
    This doesn't imply bias as the training data error and variance as the test data error. Am I missing any point here? Please elaborate.

    • @sandygaddam
      @sandygaddam 4 роки тому +1

      Hi Devasheeesh,
      Variance occurs when the model performs good on the trained dataset but does not do well on a dataset that it is not trained on, like a test dataset or validation dataset. Variance tells us how scattered are the predicted value from the actual value. For easier understanding of the concept, we can take it as test or validation data error.
      Bias is how far are the predicted values from the actual values. If the average predicted values are far off from the actual values then the bias is high.

  • @dharmamaharjan1354
    @dharmamaharjan1354 4 роки тому +1

    XGBoost uses LASSO and Ridge regularization to prevent overfitting(low bias and high variance)

  • @arpitadas6221
    @arpitadas6221 Рік тому

    bestttt ...sir please make videos like this means in board....its better to understand this way

  • @irfaanmeah9229
    @irfaanmeah9229 4 роки тому +4

    Firstly, I thank Krish for making really informative videos. For model 2 in the classification problem. If the error between the training and the test set is not drastically different, it may not be a result of high bias and high variance. In the real world scenario, we are unlikely to get a very high accuracy i.e >90%. I would consider a high bias and high variance prob as for example - Training acc - 75% and Test acc - 65%. What do you think?

    • @aravindmolugu3308
      @aravindmolugu3308 3 роки тому

      It depends on the problem statement and domain of your data. If the data is clean and abundant, you can get a high accuracy above 90% too, for both test and train. It all depends on domain understanding.

  • @shreyagupta222
    @shreyagupta222 2 роки тому

    Thanks a lot for the wonderful explanation

  • @vladimirkirichenko1972
    @vladimirkirichenko1972 Рік тому

    Insanely good video. Also this has amazing energy!

  • @06madhav
    @06madhav 4 роки тому +2

    Thanks for this. Amazing explanation.

  • @mansishah8151
    @mansishah8151 3 роки тому

    Love watching your video’s..You explain very well.

  • @premprasad3511
    @premprasad3511 3 місяці тому +1

    How do you know so much ? ?? You talk about machine language as if you are born embedded with all that knowledge ! God bless you with more knowledge and intelligence so that you will share with more people !

  • @vennapoosasurendranathredd1027
    @vennapoosasurendranathredd1027 3 роки тому

    Mam even though I am studying AI in my clg probably this is easy to understand thanks man..

  • @manishsharma2211
    @manishsharma2211 4 роки тому +3

    Very good. Revised my concepts perfectly 🔥🔥

  • @VigneshVicky-cn8ek
    @VigneshVicky-cn8ek 3 роки тому

    You nailed it man ! Great work ! Respect your time and effort!

  • @ameysawant2301
    @ameysawant2301 4 роки тому

    You make one of the best tech videos on youtube !!!!

  • @saydainsheikh3123
    @saydainsheikh3123 2 роки тому

    brilliant video!!!!! explained everything to the point.

  • @anujvyas9493
    @anujvyas9493 4 роки тому +5

    XGBoost - Low Bias and Low Varience

  • @leeladesai
    @leeladesai 3 роки тому

    My understanding is if the model has high bias then it is underfit irrespective of variance (high or low)
    for underfit model ( having high bias - high train error) - if the test error is closer to train then low varience... if the test error is larger the than train then high variance
    Variance doesn't play a role in saying if a model is underfit or not....

  • @adiMallya
    @adiMallya 4 роки тому +1

    Thanks for revising these important concepts

  • @sandipansarkar9211
    @sandipansarkar9211 3 роки тому

    watched it oncce again for better clarity
    .Thanks

  • @benyaminem.9385
    @benyaminem.9385 11 місяців тому

    Thank you so much bro ! So clear !!!

  • @tagoreji2143
    @tagoreji2143 2 роки тому

    Tqsm Sir.Very Valuable Information

  • @Cat_Sterling
    @Cat_Sterling 3 роки тому

    Awesome video, thank you so much for these wonderful explanations, they are much needed!

  • @basharb5215
    @basharb5215 2 роки тому

    Great explanation. Thank you so much!

  • @brahimferjani3147
    @brahimferjani3147 3 роки тому

    Good pedagogy and easy explanation. Thanks a lot

  • @donkeshwarkavyasree8632
    @donkeshwarkavyasree8632 2 роки тому

    The best explanation among the whole youtube channels 👏. I love the way how you always keep things simple. Glad to find out about your channel, sir.

  • @sharawyabdul6222
    @sharawyabdul6222 3 роки тому

    Very well explained. Thanks

  • @janakiraam1
    @janakiraam1 3 роки тому

    Clear explanation. @krish sir thanks for making this video

  • @sagarmestry5514
    @sagarmestry5514 4 роки тому +4

    Superb job sir... it’s the easiest explanation I’ve seen regarding this topic... hope you’ll upload video regarding Gradient Descent too :)