It's quite amazing what is possible in this field and just how fast it is developing. I mean, I try to stay on top of these things and even I get blown away from time to time. For example just how different, yet accurate the faces at 40:00 are
one question for slide - VAEs: Latent perturbation (around 29:00): if "smile" and "head pose" are independent to each other, when look at faces by rows, why the "extent of smiling" are different? it's supposed that only "head pose" varies by rows but "smiling" remains to be constant?
Great question, while an ideal encoding would promote independence among latent variables, this is not always the case with the variables that are actually learned via gradient descent. This leads to the problem you describe, where the variables are mostly "smile" and "head pose" but still share some overlap (or entanglement). In fact, a huge field of research goes into "disentangling" latent variables so they are independent of each other. Check out work on "Beta-VAEs" which try to impose constraints during learning to promote this type of behavior.
I am a bit confused about the continuity of the latent variable at around 28:00 (VAEs: Latent perturbation)... What exactly does this continuity have to do with the normal distribution? Won't latent perturbation still work if we use the traditional AEs? -- If the case is that it won't work, I could surmise it is due to the network not being penalized for "cheating" in traditional AEs? But then it would be the regularization rather than the random factor \epsilon that is responsible for this property of continuity? Also, this leads to another confusion: Previously, I intuitively understood the advantage of VAEs to be that the stochastic approach introduces randomness to "loosen" the network a bit to prevent it from cheating/doing something similar to overfitting, but then it seems the regularization is doing this job. So what exactly is the role of that randomness in the advantage of VAEs? Is it simply because to have such regularization as this we must introduce some stochastic elements into our model? Sorry for bombarding you with questions! And many thanks in advance!!
in autoencoders and virtual encoders our loss function was just a "squared error" , so why are we using discriminator in GANs, instead of just a "squared error". Is it because in AEs we were going into compressed latent space from an original image to the predicted so it needed a little bit of tuning, while in GANs we are going from noise to predicted image so it need higher tuning ,please clarify this doubt>>>
In backprop you are always looking for the gradient of the last function (in this case f) with respect to the weights (phi), i.e you want (partial f / partial phi). Using the chain rule: (partial f / partial phi) = (partial f / partial z) * (partial z / partial phi). In computational diagrams it is usually shown as in the slides.
Александр, Ава. Курс лекций очень хорош! Если бы использовали в лабах tf2.0 то цены бы ему не было. Но в любом случае отличная работа. From Russia with love ;)
Both faces are fake.The man has an asymmetric mustache and the womon's shirt is wired (right shoulder is not fully covered but the left shoulder is fully covered).
Awesome content Alexander and Ava, your efforts are much appreciated!
It's quite amazing what is possible in this field and just how fast it is developing. I mean, I try to stay on top of these things and even I get blown away from time to time. For example just how different, yet accurate the faces at 40:00 are
5.56
4th row, 4th column: shahid afridi
Thank you for sharing the class. These courses are great, including a lot of aspects in deep learning.
one question for slide - VAEs: Latent perturbation (around 29:00):
if "smile" and "head pose" are independent to each other, when look at faces by rows, why the "extent of smiling" are different? it's supposed that only "head pose" varies by rows but "smiling" remains to be constant?
problems seem only to appear at 4th&5th rows
Great question, while an ideal encoding would promote independence among latent variables, this is not always the case with the variables that are actually learned via gradient descent. This leads to the problem you describe, where the variables are mostly "smile" and "head pose" but still share some overlap (or entanglement). In fact, a huge field of research goes into "disentangling" latent variables so they are independent of each other. Check out work on "Beta-VAEs" which try to impose constraints during learning to promote this type of behavior.
I am a bit confused about the continuity of the latent variable at around 28:00 (VAEs: Latent perturbation)... What exactly does this continuity have to do with the normal distribution? Won't latent perturbation still work if we use the traditional AEs? -- If the case is that it won't work, I could surmise it is due to the network not being penalized for "cheating" in traditional AEs? But then it would be the regularization rather than the random factor \epsilon that is responsible for this property of continuity? Also, this leads to another confusion: Previously, I intuitively understood the advantage of VAEs to be that the stochastic approach introduces randomness to "loosen" the network a bit to prevent it from cheating/doing something similar to overfitting, but then it seems the regularization is doing this job. So what exactly is the role of that randomness in the advantage of VAEs? Is it simply because to have such regularization as this we must introduce some stochastic elements into our model? Sorry for bombarding you with questions! And many thanks in advance!!
superb content looking forward for more of these kind....
in autoencoders and virtual encoders our loss function was just a "squared error" , so why are we using discriminator in GANs, instead of just a "squared error".
Is it because in AEs we were going into compressed latent space from an original image to the predicted so it needed a little bit of tuning, while in GANs we are going from noise to predicted image so it need higher tuning ,please clarify this doubt>>>
27:05 shouldn't the backdrop at phi on the reparametrized form be (partial z / partial phi) ?
In backprop you are always looking for the gradient of the last function (in this case f) with respect to the weights (phi), i.e you want (partial f / partial phi). Using the chain rule: (partial f / partial phi) = (partial f / partial z) * (partial z / partial phi). In computational diagrams it is usually shown as in the slides.
Александр, Ава. Курс лекций очень хорош! Если бы использовали в лабах tf2.0 то цены бы ему не было. Но в любом случае отличная работа. From Russia with love ;)
Lol lol lol awesome...
Learnt a lot!!
Thank you from korea♡
Since you've already given a prior, how are means and variances stochastic in nature? i dont get it. In the original VAEs
can you also teach how to write code for it?
thanks a lot for sharing! great content
The only thing that i'd like to ask is: what's the difference between encoding and embedding?
To me it's synonymous. Only that embedding is mostly used in Natural Language context e.g. text, word, sentence, document
Thank you very much for sharing.
Do you know some resources to do some deep fakes like the Obama introduction of the course?
Thank you guys for sharing this valuable content.
Excellent!
Awesome Stuff
Brilliant
i love this vedio!!!!
i feel small
i feel stupid
Both faces are fake.The man has an asymmetric mustache and the womon's shirt is wired (right shoulder is not fully covered but the left shoulder is fully covered).
There is an audio problem with this series of lectures, do you all agree?
Sony is reasonably cheaper and better than the rest...(if you're planning...)