To the question 21:05 The estimate is unbiased - it is correctly stated that it is asymptotically exact - but it has high variance. The variational inference approach gives another tradeoff: lower the variance at the cost of the bias introduced by restricting the family of the proposal latent densities q(z).
At time 1:15:22 (slide 55) for likelihood ratio gradient on VAE, I did my calculation. And for the first term (the one that does not become 0 eventually), I have the term sum over all z, we start with the term 1/q * \triangledown_\phi (q * [ .... ] ) = 1/q * \triangledown_\phi (q * [.... ]) + 1/q * q * \triangledown_\phi [...], I see the expected value of the second term is 0. For the first term, when i take expectation, it becomes \sum_z q * 1/q \triangledown_\phi (q* [....]) = \sum_z \triangledown_\phi (q* [...]) not sure how to proceed... and it's not the same case as slide 56 because we don't have something of q_phi(z|x) f(z), in this case we have q_phi(z|x) [ logp_z(z) + log p_\theta(z|x) - log q_phi(z|x) ]
@40:13 yes, but how to backpropagate through such a thing? That's not clear at all.. It is all the more difficult to understand because the neural nets hide within pdf's...
Great lecture, was nice to have the derivation of the objective spelled out in such an intuitive way
To the question 21:05 The estimate is unbiased - it is correctly stated that it is asymptotically exact - but it has high variance. The variational inference approach gives another tradeoff: lower the variance at the cost of the bias introduced by restricting the family of the proposal latent densities q(z).
At time 1:15:22 (slide 55) for likelihood ratio gradient on VAE, I did my calculation. And for the first term (the one that does not become 0 eventually), I have the term sum over all z,
we start with the term 1/q * \triangledown_\phi (q * [ .... ] ) = 1/q * \triangledown_\phi (q * [.... ]) + 1/q * q * \triangledown_\phi [...],
I see the expected value of the second term is 0. For the first term, when i take expectation, it becomes \sum_z q * 1/q \triangledown_\phi (q* [....]) = \sum_z \triangledown_\phi (q* [...])
not sure how to proceed... and it's not the same case as slide 56 because we don't have something of q_phi(z|x) f(z), in this case we have q_phi(z|x) [ logp_z(z) + log p_\theta(z|x) - log q_phi(z|x) ]
Is the code for the examples available or is it possible to get?
on github you can find course 2020 with demos and solutions
Just goes to show how much of stats and probability is required here.... I mean I did not know what "Importance sampling" was.. Great lecture though!
what does he mean by I sample z, minute 23?
What kind of book to read, please refer the book from start. give document for it.
Which probability course to revise to understadn qickli all this? sampling ...(from youtube) or other
@40:13 yes, but how to backpropagate through such a thing? That's not clear at all..
It is all the more difficult to understand because the neural nets hide within pdf's...
I guess because I am a person below the average, I feel the lecture is very difficult to follow. Thanks for sharing though.