Best introductory video for DRL. Read lots of books or reviews but none of them explained it so clearly. Thank u so much for the excellent presentation
Nice lecture! However, it was hard for me to follow the idea of loss function at 44:44. So it works if R_t is negative for low rewards and positive for high rewards, right?
We minimize the loss. By minimizing negative log of the probability multiplied by the reward, we are actually optimizing for the higher reward, which in a sense makes it gradient ascent.
@@tommyholladay totally agreed, but an easier statement to explain this would be that we are taking the negative of loss likelihood because for high values, we want to proceed towards that direction in our algorithm, so, we user the negative to reverse the direction of gradient.
This is awesome, but how can some of us watching this recorded video on UA-cam have the opportunity to practice with VISTA, is there any arrangement for us.
Yes! VISTA is available to the public as well here: github.com/vista-simulator/vista Also checkout the VISTA related lab3 on the open source software labs for the class for examples.
Hello I have a question, when we do the training, what data is used to train the agent? Is it the environment (Carla for exemple ) ? And can we transform the environment into images ? I hope to reply me sir i have a project in university . and thank you .
Best introductory video for DRL. Read lots of books or reviews but none of them explained it so clearly. Thank u so much for the excellent presentation
This course is excellent. The two instructors explain the concepts very well.
my very favorite... honestly i so much love this DL course... thanks for your efforts.
Another step towards the singularity.
loving the course so easy to understand
Nice lecture! However, it was hard for me to follow the idea of loss function at 44:44. So it works if R_t is negative for low rewards and positive for high rewards, right?
We minimize the loss. By minimizing negative log of the probability multiplied by the reward, we are actually optimizing for the higher reward, which in a sense makes it gradient ascent.
@@tommyholladay totally agreed, but an easier statement to explain this would be that we are taking the negative of loss likelihood because for high values, we want to proceed towards that direction in our algorithm, so, we user the negative to reverse the direction of gradient.
This is awesome, but how can some of us watching this recorded video on UA-cam have the opportunity to practice with VISTA, is there any arrangement for us.
Yes! VISTA is available to the public as well here: github.com/vista-simulator/vista
Also checkout the VISTA related lab3 on the open source software labs for the class for examples.
@@AAmini Thanks Prof., I will explore it, Data Science community will forever appreciate your contribution to the growth of the field.
Hello
I have a question, when we do the training, what data is used to train the agent? Is it the environment (Carla for exemple ) ? And can we transform the environment into images ?
I hope to reply me sir i have a project in university .
and thank you .
Great Work !!!
Thanks a lot!
Hey, do you use a Mac or a windows machine with Ubuntu installed on it?
Why Tesla has 1500 data labelers instead of reinforcement learning?
Because actual accidents are much costly.
Thank you !
Hello, Amini. Why can't I see the slides of this video on the homepage?
Excellent lecture; thank you.
Amazing ❤️
Now, that's the good stuff!!!
👏
Looking to it
Hallucinate? 🤔😭
Starcraft 2!!!!