The Reparameterization Trick

Поділитися
Вставка
  • Опубліковано 17 гру 2024

КОМЕНТАРІ • 47

  • @advayargade746
    @advayargade746 10 місяців тому +4

    Sometimes understanding the complexity makes a concept clearer. This was one such example. Thanks a lot.

  • @s8x.
    @s8x. 8 місяців тому +4

    WOW! THANK U. FINALLY MAKING IT EASY TK UNDERSTAND. WATCHED SO MANY VIDEOS ON VAE AND THEY JUST BRIEFLY GO OVER THE EQUATION WITHOUT EXPLAINING

  • @biogirl18
    @biogirl18 Рік тому +3

    Holy God. What a great teacher..

  • @daniel66-dd
    @daniel66-dd 8 днів тому

    Amazing explanation

  • @MonkkSoori
    @MonkkSoori Рік тому +5

    Thank you for your effort, it all tied up nicely at the end of the video. This was clear and useful.

  • @PaulF-l3j
    @PaulF-l3j Рік тому +1

    Very nice video, it helped me a lot. Finally someone explaining math without leaving the essential parts aside.

  • @slimanelarabi8147
    @slimanelarabi8147 11 місяців тому +1

    Thanks, this is a good explanation of the black point of VAE

  • @AkashaVaani-mx7cq
    @AkashaVaani-mx7cq 4 місяці тому +1

    Great work. Thanks...

  • @amirnasser7768
    @amirnasser7768 Рік тому +1

    Thank you, I liked your intuition, amazing effort.

    • @amirnasser7768
      @amirnasser7768 Рік тому

      Also please correct me if I am wrong but I think at minute 17 you should not use the same theta notation for both "g_theta()" and "p_theta()" since you assumed that you do not know the theta parameters (the main cause of differentiation problem) for "p()" but you know the parameters for "g()".

  • @carlosgruss7289
    @carlosgruss7289 7 місяців тому +2

    Very good explanation thank you

  • @chasekosborne2
    @chasekosborne2 Рік тому +1

    Thank you for this video, this has helped a lot in my own research on the topic

  • @abdelrahmanahmad3054
    @abdelrahmanahmad3054 Рік тому +2

    This is a life changing video, thank you very much 😊 🙏🏻

  • @salahaldeen1751
    @salahaldeen1751 Рік тому +1

    Thank you so much! Please continue with more videos on ML.

    • @ml_dl_explained
      @ml_dl_explained  Рік тому

      Will do :) let me know if you have a specific topic in mind.

  • @ettahiriimane4480
    @ettahiriimane4480 Рік тому +2

    Thank you for this video, this has helped me a lot

  • @sailfromsurigao
    @sailfromsurigao 2 місяці тому

    very clear explanation. subscribed!

  • @joshuat6124
    @joshuat6124 2 місяці тому

    Thanks for the video ,subbed!

  • @franzmayr
    @franzmayr 2 місяці тому

    Great video! Extremely clear :)

  • @tonglang7090
    @tonglang7090 3 місяці тому

    super clear explained, thanks

  • @HuuPhucTran-jt4rk
    @HuuPhucTran-jt4rk Рік тому +2

    Your explanation is brilliant! We need more thinks like this. Thank you!

  • @Gus-AI-World
    @Gus-AI-World 2 роки тому +3

    Beautifully said. Love how you laid out things, both the architecture and math. Thanks a million.

  • @tempdeltavalue
    @tempdeltavalue 2 роки тому +2

    16:27 It's unclear (for me) (in context of gradient operator and expectation) why f_theta(z) can't be differentiated and WHY replacement of f_theta to g_theta(eps, x) allows to move gradient op inside of expectation and "make something differentiable" (from math point of view)
    p.s
    in practice we train MSE and KL divergence between two gaussians (q(z:x):p(z)) where p_mean = 0 and p_sigma = 1 and it allows us to "train" mean and var vectors in VAE

    • @ml_dl_explained
      @ml_dl_explained  2 роки тому +1

      Thank you for the feedback :)
      I will try to address both items:
      1. The replacement makes the function (or the neural network) deterministic and thus differentiable and smooth. Looking at the definition of the derivative can help understand this better: lim h->inf ( f(x+h) - f(x) / h ) where a slight change in x produces a small change in the derivative of f(x), makes the function "continuously differentiable". This is the case for the g function we defined in the video: a slight change in epsilon produces a slightly different z. On the other, i.i.d sampling does not have any relation for two subsequent samples, by definition, so the derivative is not smooth enough for the model to actually learn.
      2. Yes, I've considered adding an explanation for the VAE loss function (ELBO) but I wanted the focus of the video to be solely on the trick itself since it can be used for other things like the Gumble Softmax Distribution. I will consider making future videos both on ELBO loss and Gumble Softmax Distribution.

    • @tempdeltavalue
      @tempdeltavalue 2 роки тому +1

      @@ml_dl_explained
      Thank for an answer ! ❤
      Ohh, I'm just missed that we make random sample..
      my confusion was at 15:49 you have E_p_theta = "sum of terms" which are contains z(sample) and on the next slide you just remove them (by replacement z to epsilon and f -> g)

    • @ml_dl_explained
      @ml_dl_explained  2 роки тому +2

      Yes, I understand your confusion. The next slide on re-parametrizes does not divide into two terms like in the "sum of terms" you described. This is because the distribution is not parametrized and so when calculating the gradient the case changes: Instead of a multiplication of two functions (p_theta(z)*f_theta(z) -- like we had in the first slide) we now only have one function and the distribution parameters are encapsulated inside of it (f_theta(g_theat(eps, x)) -- like we had in the second slide).
      Hope this helps :)

  • @mohammedyasin2087
    @mohammedyasin2087 Рік тому +9

    This was the analogy I got from ChatGPT to understand the problem 😅. Hope it's useful to someone:
    "Certainly, let's use an analogy involving shooting a football and the size of a goalpost to explain the reparameterization trick:
    Imagine you're a football player trying to score a goal by shooting the ball into a goalpost. However, the goalpost is not of a fixed size; it varies based on certain parameters that you can adjust. Your goal is to optimize your shooting technique to score as many goals as possible.
    Now, let's draw parallels between this analogy and the reparameterization trick:
    1. **Goalpost Variability (Randomness):** The size of the goalpost represents the variability introduced by randomness in the shooting process. When the goalpost is larger, it's more challenging to score, and when it's smaller, it's easier.
    2. **Shooting Technique (Model Parameters):** Your shooting technique corresponds to the parameters of a probabilistic model (such as `mean_p` and `std_p` in a VAE). These parameters affect how well you can aim and shoot the ball.
    3. **Optimization:** Your goal is to optimize your shooting technique to score consistently. However, if the goalpost's size (randomness) changes unpredictably every time you shoot, it becomes difficult to understand how your adjustments to the shooting technique (model parameters) are affecting your chances of scoring.
    4. **Reparameterization Trick:** To make the optimization process more effective, you introduce a fixed-size reference goalpost (a standard normal distribution) that represents a known level of variability. Every time you shoot, you still adjust your shooting technique (model parameters), but you compare your shots to the reference goalpost.
    5. **Deterministic Transformation:** This reference goalpost allows you to compare and adjust your shooting technique more consistently. You're still accounting for variability, but it's structured and controlled. Your technique adjustments are now more meaningful because they're not tangled up with the unpredictable variability of the changing goalpost.
    In this analogy, the reparameterization trick corresponds to using a reference goalpost with a known size to stabilize the optimization process. This way, your focus on optimizing your shooting technique (model parameters) remains more effective, as you're not constantly grappling with unpredictable changes in the goalpost's size (randomness)."

    • @safau
      @safau Рік тому

      oh my god !! So good.

    • @metalhead6067
      @metalhead6067 11 місяців тому

      dam nice bro, thank you for this

  • @openroomxyz
    @openroomxyz 3 місяці тому

    Thanks for explenation

  • @Tinien-qo1kq
    @Tinien-qo1kq 5 місяців тому +1

    it is reaally fantastic

  • @matthewpublikum3114
    @matthewpublikum3114 Місяць тому

    Isn't the random node, e, used here is to parameterize the latent space with e, such that the user can explore the space with e?

  • @КириллКлимушин
    @КириллКлимушин 9 місяців тому

    I have a small question about the video, that slightly bothers me. What this normal distribution we are sampling from consists of? If it's distribution of latent vectors, how do we collect them during training?

  • @vkkn5162
    @vkkn5162 7 місяців тому +1

    your'e voice is literally from Giorgio by moroder song

  • @my_master55
    @my_master55 2 роки тому +1

    Thanks for the vid 👋
    Actually lost the point in the middle of the math explanation, but that's prob because I'm not that familiar with VAEs and don't know some skipped tricks 😁
    I guess for the field guys it's a bit more clear :)

    • @ml_dl_explained
      @ml_dl_explained  2 роки тому

      Thank you very much for the positive feedback 😊.
      Yes, the math part is difficult to understand and took me a few tries until I eventually figured it out. Feel free to ask any question about unclear aspects and I will be happy to answer here in the comments section.

  • @RezaGhasemi-gk6it
    @RezaGhasemi-gk6it 3 місяці тому

    Perfect!

  • @jinyunghong
    @jinyunghong Рік тому +1

    Thank you so much for your video! It definitely saved my life :)

  • @wilsonlwtan3975
    @wilsonlwtan3975 8 місяців тому

    It is cool although I don't really understand the second half. 😅

  • @dennisestenson7820
    @dennisestenson7820 11 місяців тому +4

    The derivative of the expectation is the expectation of the derivative? That's surprising to my feeble mind.

    • @alexmurphy6100
      @alexmurphy6100 3 місяці тому +1

      You will often hear people talk about expectation being a linear operator, particularly when it comes to this fact about derivatives. Linearity of Differentiation property in calculus tells us this works for all linear transformations of functions.

    • @tahirsyed5454
      @tahirsyed5454 25 днів тому

      They're both linear, and commute.

  • @wodniktoja8452
    @wodniktoja8452 7 місяців тому

    dope