The Reparameterization Trick

Поділитися
Вставка
  • Опубліковано 14 тра 2024
  • This video covers what the Reparameterization trick is and when we use it. It also explains the trick from a mathematical/statistical aspect.
    CHAPTERS:
    00:00 Intro
    00:28 What/Why?
    08:17 Math

КОМЕНТАРІ • 36

  • @carlosgruss7289
    @carlosgruss7289 День тому

    Very good explanation thank you

  • @mohammedyasin2087
    @mohammedyasin2087 8 місяців тому +6

    This was the analogy I got from ChatGPT to understand the problem 😅. Hope it's useful to someone:
    "Certainly, let's use an analogy involving shooting a football and the size of a goalpost to explain the reparameterization trick:
    Imagine you're a football player trying to score a goal by shooting the ball into a goalpost. However, the goalpost is not of a fixed size; it varies based on certain parameters that you can adjust. Your goal is to optimize your shooting technique to score as many goals as possible.
    Now, let's draw parallels between this analogy and the reparameterization trick:
    1. **Goalpost Variability (Randomness):** The size of the goalpost represents the variability introduced by randomness in the shooting process. When the goalpost is larger, it's more challenging to score, and when it's smaller, it's easier.
    2. **Shooting Technique (Model Parameters):** Your shooting technique corresponds to the parameters of a probabilistic model (such as `mean_p` and `std_p` in a VAE). These parameters affect how well you can aim and shoot the ball.
    3. **Optimization:** Your goal is to optimize your shooting technique to score consistently. However, if the goalpost's size (randomness) changes unpredictably every time you shoot, it becomes difficult to understand how your adjustments to the shooting technique (model parameters) are affecting your chances of scoring.
    4. **Reparameterization Trick:** To make the optimization process more effective, you introduce a fixed-size reference goalpost (a standard normal distribution) that represents a known level of variability. Every time you shoot, you still adjust your shooting technique (model parameters), but you compare your shots to the reference goalpost.
    5. **Deterministic Transformation:** This reference goalpost allows you to compare and adjust your shooting technique more consistently. You're still accounting for variability, but it's structured and controlled. Your technique adjustments are now more meaningful because they're not tangled up with the unpredictable variability of the changing goalpost.
    In this analogy, the reparameterization trick corresponds to using a reference goalpost with a known size to stabilize the optimization process. This way, your focus on optimizing your shooting technique (model parameters) remains more effective, as you're not constantly grappling with unpredictable changes in the goalpost's size (randomness)."

    • @safau
      @safau 6 місяців тому

      oh my god !! So good.

    • @metalhead6067
      @metalhead6067 4 місяці тому

      dam nice bro, thank you for this

  • @s8x.
    @s8x. Місяць тому +1

    WOW! THANK U. FINALLY MAKING IT EASY TK UNDERSTAND. WATCHED SO MANY VIDEOS ON VAE AND THEY JUST BRIEFLY GO OVER THE EQUATION WITHOUT EXPLAINING

  • @user-hy4kl3my6h
    @user-hy4kl3my6h 7 місяців тому

    Very nice video, it helped me a lot. Finally someone explaining math without leaving the essential parts aside.

  • @abdelrahmanahmad3054
    @abdelrahmanahmad3054 10 місяців тому +1

    This is a life changing video, thank you very much 😊 🙏🏻

  • @chasekosborne2
    @chasekosborne2 6 місяців тому

    Thank you for this video, this has helped a lot in my own research on the topic

  • @advayargade746
    @advayargade746 3 місяці тому

    Sometimes understanding the complexity makes a concept clearer. This was one such example. Thanks a lot.

  • @ettahiriimane4480
    @ettahiriimane4480 5 місяців тому +1

    Thank you for this video, this has helped me a lot

  • @MonkkSoori
    @MonkkSoori Рік тому +4

    Thank you for your effort, it all tied up nicely at the end of the video. This was clear and useful.

  • @amaramouri9137
    @amaramouri9137 Рік тому +2

    Good job, and thanks for your effort. I hope we see more videos in the future.

  • @slimanelarabi8147
    @slimanelarabi8147 4 місяці тому

    Thanks, this is a good explanation of the black point of VAE

  • @amirnasser7768
    @amirnasser7768 8 місяців тому

    Thank you, I liked your intuition, amazing effort.

    • @amirnasser7768
      @amirnasser7768 8 місяців тому

      Also please correct me if I am wrong but I think at minute 17 you should not use the same theta notation for both "g_theta()" and "p_theta()" since you assumed that you do not know the theta parameters (the main cause of differentiation problem) for "p()" but you know the parameters for "g()".

  • @liuzq7
    @liuzq7 10 місяців тому

    Holy God. What a great teacher..

  • @salahaldeen1751
    @salahaldeen1751 Рік тому +1

    Thank you so much! Please continue with more videos on ML.

    • @raneisenberg2155
      @raneisenberg2155  Рік тому

      Will do :) let me know if you have a specific topic in mind.

  • @Gus-AI-World
    @Gus-AI-World Рік тому +3

    Beautifully said. Love how you laid out things, both the architecture and math. Thanks a million.

  • @HuuPhucTran-jt4rk
    @HuuPhucTran-jt4rk Рік тому +2

    Your explanation is brilliant! We need more thinks like this. Thank you!

  • @jinyunghong
    @jinyunghong Рік тому +1

    Thank you so much for your video! It definitely saved my life :)

  • @wodniktoja8452
    @wodniktoja8452 День тому

    dope

  • @my_master55
    @my_master55 Рік тому +1

    Thanks for the vid 👋
    Actually lost the point in the middle of the math explanation, but that's prob because I'm not that familiar with VAEs and don't know some skipped tricks 😁
    I guess for the field guys it's a bit more clear :)

    • @raneisenberg2155
      @raneisenberg2155  Рік тому

      Thank you very much for the positive feedback 😊.
      Yes, the math part is difficult to understand and took me a few tries until I eventually figured it out. Feel free to ask any question about unclear aspects and I will be happy to answer here in the comments section.

  • @user-td8vz8cn1h
    @user-td8vz8cn1h 2 місяці тому

    I have a small question about the video, that slightly bothers me. What this normal distribution we are sampling from consists of? If it's distribution of latent vectors, how do we collect them during training?

  • @tempdeltavalue
    @tempdeltavalue Рік тому +2

    16:27 It's unclear (for me) (in context of gradient operator and expectation) why f_theta(z) can't be differentiated and WHY replacement of f_theta to g_theta(eps, x) allows to move gradient op inside of expectation and "make something differentiable" (from math point of view)
    p.s
    in practice we train MSE and KL divergence between two gaussians (q(z:x):p(z)) where p_mean = 0 and p_sigma = 1 and it allows us to "train" mean and var vectors in VAE

    • @raneisenberg2155
      @raneisenberg2155  Рік тому +1

      Thank you for the feedback :)
      I will try to address both items:
      1. The replacement makes the function (or the neural network) deterministic and thus differentiable and smooth. Looking at the definition of the derivative can help understand this better: lim h->inf ( f(x+h) - f(x) / h ) where a slight change in x produces a small change in the derivative of f(x), makes the function "continuously differentiable". This is the case for the g function we defined in the video: a slight change in epsilon produces a slightly different z. On the other, i.i.d sampling does not have any relation for two subsequent samples, by definition, so the derivative is not smooth enough for the model to actually learn.
      2. Yes, I've considered adding an explanation for the VAE loss function (ELBO) but I wanted the focus of the video to be solely on the trick itself since it can be used for other things like the Gumble Softmax Distribution. I will consider making future videos both on ELBO loss and Gumble Softmax Distribution.

    • @tempdeltavalue
      @tempdeltavalue Рік тому +1

      @@raneisenberg2155
      Thank for an answer ! ❤
      Ohh, I'm just missed that we make random sample..
      my confusion was at 15:49 you have E_p_theta = "sum of terms" which are contains z(sample) and on the next slide you just remove them (by replacement z to epsilon and f -> g)

    • @raneisenberg2155
      @raneisenberg2155  Рік тому +2

      Yes, I understand your confusion. The next slide on re-parametrizes does not divide into two terms like in the "sum of terms" you described. This is because the distribution is not parametrized and so when calculating the gradient the case changes: Instead of a multiplication of two functions (p_theta(z)*f_theta(z) -- like we had in the first slide) we now only have one function and the distribution parameters are encapsulated inside of it (f_theta(g_theat(eps, x)) -- like we had in the second slide).
      Hope this helps :)

  • @wilsonlwtan3975
    @wilsonlwtan3975 Місяць тому

    It is cool although I don't really understand the second half. 😅

  • @dennisestenson7820
    @dennisestenson7820 4 місяці тому +1

    The derivative of the expectation is the expectation of the derivative? That's surprising to my feeble mind.