By far the best resource I've found on VAEs, after _lots_ of reading and video watching. This puts it all together intelligently and clearly. Thank you!!
This video is amazing! I like how you get "reparametrization trick" into the picture, that you first calculate the gradient seperately to show the potential issue. Super clear!
Thank you very much for your work, it has been decisive for me to understand the basis of this type of models and it will surely be of great help for me to understand the functioning of others.
NOTES: - Slight typo at 15:20. Where it says "ELBO" within the integral, it's just the difference of logs, whereas the ELBO is actually the expectation of difference of logs over q(z|x). - At 20:36 (the Gaussian case), I've phrased it in univariate terms (even though the Gaussian is generally multivariate); however, we would still recover the result that what we're calculating is a Euclidean distance between x and mu, given that our covariance matrix is an identity matrix.
The joint probability is traceable under our model because it's easy to estimate the probability of say, a particular z given a particular x; however the marginal probability of x requires integration over all z, which makes it intractable.
Hello, I can't understand the step of instant 15:20. You expand the expectation, you bring the gradient int the integral, but, why do you substitute the difference of logs by the ELBO?? The ELBO is the expectation of the difference of logs over q, isn't it?
In 13:25 it was stated that maximizing ELBO also maximizes evidence and minimizes the KL divergence. However, you did not prove/show how can be this happen. Actually, I thought the evidence was a constant because when we gather a dataset of {x1, x2, ... xn}, p_theta(xi) is constant (when theta is constant)
Not 100% sure on this answer, but intuitively seems there are 2 ways to improve the ELBO: one is to reduce the KL divergence (which is the distance from the upper bound), and the second is to increase the upper bound directly (which corresponds to maximizing evidence.
Yup, that's right! No rigorous proof here for ELBO maximisation (which just stems from the intractability issue), but this is how you can maximise it in practice.
I just came off of about three hours of lectures on VAEs in the Stanford "Deep Generative Models" course, and they didn't do as good of an explanation as you did here 🤷♂
why don't i see such detailed explanation videos as PDFs? do you guys think you'll get rich from a couple hundred views on youtube??? make a PDF (including ALL your explanations during the video!). having only an online version, plus having to click stop and juggle with the video position slider and never having a continuous presentation in front of my eyes is just plain nonsense for this kind of material. basically, it seems (from what i've been able to follow) to be a very well done presentation in terms of content , but totally useless in the form the content is delivered PS: never heard this so clearly stated before: "a VAE maps a point in X to a distribution in Z by pushing the distributions towards the prior of Z which is a unit guasian, which encourages the distributions to overlap and fill all the space of the prior of Z". just brilliant.
My favorite video on VAEs, the derivation of the ELBO is much clearer than in other resources I've found online. Awesome resource.
By far the best resource I've found on VAEs, after _lots_ of reading and video watching. This puts it all together intelligently and clearly. Thank you!!
I absolutely love the way you presented a visual illustration on how the reconstruction and KL-Divergence loss terms affect the latent space.
This video is amazing! I like how you get "reparametrization trick" into the picture, that you first calculate the gradient seperately to show the potential issue. Super clear!
Best video on VAEs I found on UA-cam :D Thanks a lot!
Thank you so much for this, really cleared up how VAEs work
Very neat, I will look forward to more of your content.
You put everything together working in the video, that's really helpful. Thank you!
Thank you very much for your work, it has been decisive for me to understand the basis of this type of models and it will surely be of great help for me to understand the functioning of others.
best explanation on vaes
Oh I'm so glad I found this video
NOTES:
- Slight typo at 15:20. Where it says "ELBO" within the integral, it's just the difference of logs, whereas the ELBO is actually the expectation of difference of logs over q(z|x).
- At 20:36 (the Gaussian case), I've phrased it in univariate terms (even though the Gaussian is generally multivariate); however, we would still recover the result that what we're calculating is a Euclidean distance between x and mu, given that our covariance matrix is an identity matrix.
This video was a masterpiece
At 8:46 why is joint probability tractable? Why are others not tractable?
The joint probability is traceable under our model because it's easy to estimate the probability of say, a particular z given a particular x; however the marginal probability of x requires integration over all z, which makes it intractable.
thanks
Hello, I can't understand the step of instant 15:20. You expand the expectation, you bring the gradient int the integral, but, why do you substitute the difference of logs by the ELBO?? The ELBO is the expectation of the difference of logs over q, isn't it?
Hi Carlos, thanks for your comment! Yes, it looks like a little typo, and I'll make sure to add a comment clarifying this.
@@deepbean Thank you very much for all your work.
In 13:25 it was stated that maximizing ELBO also maximizes evidence and minimizes the KL divergence. However, you did not prove/show how can be this happen. Actually, I thought the evidence was a constant because when we gather a dataset of {x1, x2, ... xn}, p_theta(xi) is constant (when theta is constant)
Not 100% sure on this answer, but intuitively seems there are 2 ways to improve the ELBO: one is to reduce the KL divergence (which is the distance from the upper bound), and the second is to increase the upper bound directly (which corresponds to maximizing evidence.
Yup, that's right! No rigorous proof here for ELBO maximisation (which just stems from the intractability issue), but this is how you can maximise it in practice.
I just came off of about three hours of lectures on VAEs in the Stanford "Deep Generative Models" course, and they didn't do as good of an explanation as you did here 🤷♂
why don't i see such detailed explanation videos as PDFs? do you guys think you'll get rich from a couple hundred views on youtube??? make a PDF (including ALL your explanations during the video!). having only an online version, plus having to click stop and juggle with the video position slider and never having a continuous presentation in front of my eyes is just plain nonsense for this kind of material. basically, it seems (from what i've been able to follow) to be a very well done presentation in terms of content , but totally useless in the form the content is delivered
PS: never heard this so clearly stated before: "a VAE maps a point in X to a distribution in Z by pushing the distributions towards the prior of Z which is a unit guasian, which encourages the distributions to overlap and fill all the space of the prior of Z". just brilliant.
Fantastic explanation. Small erratum: the variable 𝜖 is epsilon, not eta (η) ua-cam.com/video/HBYQvKlaE0A/v-deo.htmlsi=k6EBUeCbMUl4JYWw&t=970
Ah, that's right!