Solved Example on Q-learning in Reinforcement Learning/Q-Learning example

Поділитися
Вставка
  • Опубліковано 28 сер 2024
  • Q-learning Task, Q-learning Algorithm, Solved Example on Q-Learning

КОМЕНТАРІ • 27

  • @i_izhar03
    @i_izhar03 Рік тому +2

    15:42 why didn't you choose the path 1->2->4->5->6 the sum here is 380 here and in 1->2->3 the sum is 190 only , we choose this path over 1->2->4->5->6 ?

    • @sriveni
      @sriveni  Рік тому +3

      i have given one optimal policy, u can choose any but in your path without passing through 3 you cannot go to 4 since 3 is the goal state we will stop there. Moreover the agent is not aware of the rewards , so in trail and error basis only he has to proceed. After certain number of iterations the agent might find the highest profitable path

  • @Solinvencible
    @Solinvencible 5 місяців тому

    Thank Sriveni ! A very good explained example!

    • @sriveni
      @sriveni  5 місяців тому +1

      Glad you liked it.Please do check other videos and share to all those who will be benefited.

  • @starultra2863
    @starultra2863 7 місяців тому

    For 13:51, the Q(1,2) = 90, which is correct. If we would have started solving the problem from state 1 --> 2, then Q(1,2) would have been 0 right, in the first step ????????

    • @sriveni
      @sriveni  6 місяців тому

      yes u are correct.So only u have to start with some non-zero value,then only the q-matix will be updated.Since we are doing it manually we started with Q(2,3),the same is not the case with the algorithm

  • @MS_P007
    @MS_P007 3 місяці тому

    Thank u for the lecture maam❤️

    • @sriveni
      @sriveni  3 місяці тому

      Thank you 😊

  • @ramazandurmaz3012
    @ramazandurmaz3012 10 місяців тому

    Why are we taking next states that have only non negative Q values? And there are 6 cells/states that one can end up in. But I can see only 4 actions(up,down,right,left). So why do we have 6x6 matrix?

    • @sriveni
      @sriveni  10 місяців тому

      As per the problem definition reward should be maximized,so take only non negative Q values.u can take a matrix of any size.This is like a board game where u can move only one step up,down,right,left

    • @ramazandurmaz3012
      @ramazandurmaz3012 10 місяців тому

      ​@@sriveniI understand. One more question though. Why don't we include the learning rate alpha? Or at what step we need it?

    • @sriveni
      @sriveni  10 місяців тому

      @@ramazandurmaz3012 you can use to update the Q value using the formula
      Q(s,a)

  • @user-cu6gf2rj6y
    @user-cu6gf2rj6y 7 місяців тому

    Does we have to update the R matrix also? Because certain point of time rewards in Q matrix gets updated and R will be remain as it is. And till what time we have to use R matrix if it is not get updated?

    • @starultra2863
      @starultra2863 7 місяців тому +1

      no, we dont have to update the r matrix. the r matrix remains constant...... only the q matrix changes..........r matrix serves as a guidance for updating the q matrix with right values.......

    • @sriveni
      @sriveni  6 місяців тому

      yes,u need not update the R-matrix

  • @sundarammuthu5020
    @sundarammuthu5020 10 місяців тому

    hi mam,
    the maximum cumulative reward by seletcing the path 4,5,6 is 81+90+100=271 mam..

    • @sriveni
      @sriveni  10 місяців тому

      Yes ,two paths are mentioned in the solution 1st path is the shortest

    • @sriveni
      @sriveni  10 місяців тому

      First path is 1,,2,3 and 2nd path is 4,5,6 and 3.Both are mentioned in the solution graph

  • @rams9256
    @rams9256 9 місяців тому

    mam Can you please explain same Q learning concept with another example?

  • @rahuljacker7171
    @rahuljacker7171 4 місяці тому

    hme kb pta chlega ki hmara Q matrix ab update nhi hoga ??

    • @sriveni
      @sriveni  4 місяці тому +1

      In the process of updating if the values won't change after 2 to 3 iterations we will stop

  • @afshinmonfared187
    @afshinmonfared187 29 днів тому

    only you said so or you said ok.

    • @sriveni
      @sriveni  27 днів тому

      I didn't get your doubt

  • @skayllacodm
    @skayllacodm 4 місяці тому

    Parabens