Regularization - Explained!

Поділитися
Вставка
  • Опубліковано 17 гру 2024
  • We will explain Ridge, Lasso and a Bayesian interpretation of both.
    ABOUT ME
    ⭕ Subscribe: www.youtube.co...
    📚 Medium Blog: / dataemporium
    💻 Github: github.com/ajh...
    👔 LinkedIn: / ajay-halthor-477974bb
    RESOURCES
    [1] Graphing calculator to plot nice charts: www.desmos.com
    [2] Refer section 6.2 on "Shrinkage Methods" for mathematical details: hastie.su.doma...
    [3] Karush-Kuhn-Tucker conditions for constrained optimization with inequality constraints: en.wikipedia.o...
    [4] stat exchange discussions on [3]: stats.stackexc...
    [5] Proof of ridge regression: stats.stackexc...
    [6] Laplace distribution (or double exponential distribution) used for lasso prior: en.wikipedia.o...
    [7] ‪@ritvikmath‬ 's amazing video for the bayesian interpretation of lasso and ridge regression: • Bayesian Linear Regres...
    [8] Distinction between Maximum "Likelihood" Estimations and Maximum "A Posteriori" Estimations: agustinus.kris...
    MATH COURSES (7 day free trial)
    📕 Mathematics for Machine Learning: imp.i384100.ne...
    📕 Calculus: imp.i384100.ne...
    📕 Statistics for Data Science: imp.i384100.ne...
    📕 Bayesian Statistics: imp.i384100.ne...
    📕 Linear Algebra: imp.i384100.ne...
    📕 Probability: imp.i384100.ne...
    OTHER RELATED COURSES (7 day free trial)
    📕 ⭐ Deep Learning Specialization: imp.i384100.ne...
    📕 Python for Everybody: imp.i384100.ne...
    📕 MLOps Course: imp.i384100.ne...
    📕 Natural Language Processing (NLP): imp.i384100.ne...
    📕 Machine Learning in Production: imp.i384100.ne...
    📕 Data Science Specialization: imp.i384100.ne...
    📕 Tensorflow: imp.i384100.ne...

КОМЕНТАРІ • 29

  • @ashishanand9642
    @ashishanand9642 6 місяців тому +3

    Why this is so Underrated, this should be on every one playlist for linear regression.
    Hatsoff man :)

  • @data_quest_studio4944
    @data_quest_studio4944 2 роки тому +8

    My man looks sharp and dapper

    • @CodeEmporium
      @CodeEmporium  2 роки тому +1

      Haha. Thanks! I think this shirt looked better on camera than in person. :)

  • @blairnicolle2218
    @blairnicolle2218 7 місяців тому

    Excellent videos! Great graphing for intuition of L1 regularization where parameters become exactly zero (9:45) as compared with behavior of L2 regularization.

  • @paull923
    @paull923 Рік тому +2

    I had to watch it twice to truly digest your approach, but I like your approach to the contour plot in particular. I hope to boost your channel with my comments a tiny bit ;). tyvm!
    what I was taught and what is helpful to know imo:
    1) Speaking on an abstract level what regularization achieves: it punishes high-dimensional terms.
    2) The notion of L1- and L2- regularization and when you talk about "Gaussian" for Ridge, you could also talk about "Laplace" distribution instead of double exponential distribution for Lasso regression

    • @CodeEmporium
      @CodeEmporium  Рік тому +1

      Thanks so much for your comments Paul! And yea, I feel like I have seen similar contour plots in books but never truly understood “why” they were like that until I started diving into details myself. Hopefully in the future I can explain it in a way that you’d be able to get it in a single pass through the video too :)

  • @ajaytaneja111
    @ajaytaneja111 2 роки тому +6

    Hi Ajay, great video, as always. One suggestion with your permission;) I think it might be worthwhile introducing the concept of regularization by comparing:
    Feature elimination ( which is equivalent to making the weight zero) vs reducing the weight ( which is regularization) and elaborate on this and then drfting towards Lasso and, Ridge. ;)

  • @NicholasRenotte
    @NicholasRenotte 2 роки тому

    Well hello everyone right back at you Ajay! These are fire, the live viz is on point!

    • @CodeEmporium
      @CodeEmporium  2 роки тому +1

      Thank you for noticing ma guy. I will catch up to the 100K gang soon. Pls wait for me 😂

    • @NicholasRenotte
      @NicholasRenotte 2 роки тому

      @@CodeEmporium 😂 you're one hunnit in my eyes 🙏

  • @ivanalejandrogarciaramirez8976
    @ivanalejandrogarciaramirez8976 4 місяці тому

    Thank you very much for this answer, I have been looking for it for a while: 7:42

  • @jpatel2002
    @jpatel2002 4 місяці тому

    8:27 yi < -(lambda/2)

  • @cormackjackson9442
    @cormackjackson9442 8 місяців тому

    Such an awesome video! Can't believe i hadn't made the connection between ridge and Lagrangians, literally has a lambda in it lol!

    • @cormackjackson9442
      @cormackjackson9442 8 місяців тому

      With the lasso intuition, the stepwise function you get for theta, how do you get the conditions on the right i.e. yi < lambda/2.I thought perhaps instead of writing theta < 0, you are just using the implied relationship between yi and lambda. E.g. that if theta < 0, and therefore |theta|.= - theta, which then after optimising gives theta = y - lambda/2 i.e. y = lambda/2 + theta, but then i get the opposite conditions as you...i.e. as theta is negative in this case wouldn't that give y = lambda/2 + theta < lambda/2?

  • @lucianofloripa123
    @lucianofloripa123 5 місяців тому

    Good explanation!

  • @sivakrishna5530
    @sivakrishna5530 Рік тому

    always find interesting things here ,Keep going .Good luck .

    • @CodeEmporium
      @CodeEmporium  Рік тому

      Hah! Glad that is the case. I am here to pique that interest :)

  • @chadx8269
    @chadx8269 Рік тому +1

    Nice explaination of Bayesian.
    Isn't Regularization just the Lagrange multiplier. The optimum point is where the the gradient of the constraint is proportional to the gradient of the cost function.

    • @abhirajarora7631
      @abhirajarora7631 8 місяців тому

      It is mathematically written in the same way but they are not the same. Langrange multipliers are used when you need to min/max a given function provided a constraint, and then you find the value of lambda, but in regularisation, we set the lambda value ourselves. Regularisation gives us a penalty if we take steps towards the non minimum direction and thus allows us to go back to the correct direction in the following iteration.

  • @fujinzhou7150
    @fujinzhou7150 Рік тому

    Love your awesome videos! Salute! Thank you so much!

    • @CodeEmporium
      @CodeEmporium  Рік тому

      You are so welcome! I am happy this helps

  • @alexandergeorgiev2631
    @alexandergeorgiev2631 Рік тому

    How does Gauss-Newton for nonlinear regression change with (L2) regularization?

  •  Рік тому

    Nice video, thanks! The only thing I think is slightly incorrect is that you could see polynomials with increasing degrees as complex. Since you are talking about maths, I was expecting to see imaginary unit when I first heard complex.

  • @TheRainHarvester
    @TheRainHarvester 2 роки тому

    Great content on your channel. I just found it! Heh i used desmos to debug/visualize too!
    I just added a video explaining easy multilayer back propogation. The book math with all the subscripts is confusing, so i did it without any. Much simpler to understand.

    • @CodeEmporium
      @CodeEmporium  2 роки тому +1

      Thank you! And Solid work on that explanation :)

  • @mathwithsidiqat
    @mathwithsidiqat 4 місяці тому

    Thank you

  • @kakunmaor
    @kakunmaor Рік тому

    AWESOME!!!!! thanks!

  • @lijinhui6902
    @lijinhui6902 Рік тому

    thx !

  • @glenngray2658
    @glenngray2658 2 роки тому

    🌸 pքɾօʍօʂʍ