i think it is more like the cost function associated with whether the prediction matches with the label, it is some numerical function to indicate what you want the algorithm to optimize, like matching labels in classification or getting closer to the goal in navigation
The Instructor is excellent, unfortunately, the explanation is slowed down, sometimes "blurred" because of the non-stop interjections. I believe a single voice is more than enough.
This is super clear. Thanks so much for making this video.
The greatest video I could watch to understand MDP.
Such an insightful discussion based explanation. Great 👍
Awesome videos, Thank you
so the policy tells you the next action to take in order for you to reach the reward eventually?
You're a genius
When it comes to explanation imean
Is decision of policy based on model?
no part 5?
is the reward by given? or where is the reward come from? is it equivalent to label data in supervise learning?
i think it is more like the cost function associated with whether the prediction matches with the label, it is some numerical function to indicate what you want the algorithm to optimize, like matching labels in classification or getting closer to the goal in navigation
@@oldcowbb so we should store every reward of each state?
@@braineedly7543 well you can't solve an MDP without the reward so yes
The Instructor is excellent, unfortunately, the explanation is slowed down, sometimes "blurred" because of the non-stop interjections. I believe a single voice is more than enough.