Розмір відео: 1280 X 720853 X 480640 X 360
Показувати елементи керування програвачем
Автоматичне відтворення
Автоповтор
26:15 how alphago uses montecarlotreesearch
wow, thanks for sharing the lecture
The AlphaGo example was so neat!
minute 21:13, the line before stochastic gradient should be E[\gamma^n G_n \frac{abla pi_\theta(A_n|S_n)}{ pi_\theta(A_n|S_n)}]
Thanks for the great lecture prof. :)
Thanks Prof Poupart.
Amazing lecture. Does anyone know why at 12:14 we are discounting twice?
It's not discounting twice, the first gamma^n is used to represent the state/reward distribution after executing the policy for n steps, the second gamma^t is used to represent the future expected reward.
Thank you!
35:34 Resign is just the same as calling gg. :)))
26:15 how alphago uses montecarlotreesearch
wow, thanks for sharing the lecture
The AlphaGo example was so neat!
minute 21:13, the line before stochastic gradient should be E[\gamma^n G_n \frac{
abla pi_\theta(A_n|S_n)}{ pi_\theta(A_n|S_n)}]
Thanks for the great lecture prof. :)
Thanks Prof Poupart.
Amazing lecture. Does anyone know why at 12:14 we are discounting twice?
It's not discounting twice, the first gamma^n is used to represent the state/reward distribution after executing the policy for n steps, the second gamma^t is used to represent the future expected reward.
Thank you!
35:34 Resign is just the same as calling gg. :)))