MIT 6.S191 (2023): Reinforcement Learning

Поділитися
Вставка
  • Опубліковано 2 чер 2024
  • MIT Introduction to Deep Learning 6.S191: Lecture 5
    Deep Reinforcement Learning
    Lecturer: Alexander Amini
    2023 Edition
    For all lectures, slides, and lab materials: introtodeeplearning.com
    Lecture Outline:
    0:00 - Introduction
    3:49 - Classes of learning problems
    6:48 - Definitions
    12:24 - The Q function
    17:06 - Deeper into the Q function
    21:32 - Deep Q Networks
    29:15 - Atari results and limitations
    32:42 - Policy learning algorithms
    36:42 - Discrete vs continuous actions
    39:48 - Training policy gradients
    47:17 - RL in real life
    49:55 - VISTA simulator
    52:04 - AlphaGo and AlphaZero and MuZero
    56:34 - Summary
    Subscribe to stay up to date with new deep learning lectures at MIT, or follow us @MITDeepLearning on Twitter and Instagram to stay fully-connected!!
  • Наука та технологія

КОМЕНТАРІ • 76

  • @BehindTheBackground
    @BehindTheBackground 4 місяці тому +1

    Excellent slides and explanations!

  • @mehmetburakguldogan6815
    @mehmetburakguldogan6815 Рік тому +5

    Very good work. Seen many lectures on the topic but this is by far the best one and very intuitive. Thank you for sharing.

  • @imZoox
    @imZoox 6 місяців тому +2

    haha at 19:50, William Lin the CP legend is answering the question :D
    Its so weird, I am not even from the US neither I study there but I recognize a student from his voice at MIT in an MIT online lecture :D

  • @nikteshy9131
    @nikteshy9131 Рік тому +5

    Wow, Thank very much you )) 🥰🥰😊

  • @cyrusmobini1321
    @cyrusmobini1321 Рік тому +6

    Great as always, thanks for being consistent

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @muhammadalikhan5003
    @muhammadalikhan5003 5 місяців тому +4

    Amazing lecture delivery. No words to thank you for sharing this wonderful resource for free. Thanks, MIT as well.

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @nageshwararaov118
    @nageshwararaov118 Рік тому +1

    Thank you very much. 😊

  • @TheEgesko
    @TheEgesko Рік тому +1

    Great video! 🙏

  • @user-ff9hy6qz5t
    @user-ff9hy6qz5t 6 місяців тому +6

    Thank you so much! I loved the lecture, and I'm learning so much!
    Im only 16 now, but I hope I can one day get into MIT or another great university that teaches this well!

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @sirabhop.s
    @sirabhop.s Рік тому +1

    Thank you so much

  • @yuqiwang3296
    @yuqiwang3296 Рік тому +2

    great thanks for the course!❤

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @franco-parra
    @franco-parra 5 місяців тому +2

    Great lecture. To be precise, at 24:37, you propose the 'target' as a function of the best action a' in some state s', but you don't explicitly define where this s' comes from. I may be mistaken, but I believe that this s' essentially represents the state s in the next step (t+1), as demonstrated in ua-cam.com/video/wDVteayWWvU/v-deo.html (at 14:45). I hope this information is useful to someone.

  • @ReeceGao
    @ReeceGao 8 місяців тому

    It is so clear. Thank you very much!

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @pavalep
    @pavalep Рік тому +8

    Thanks for explaining complex Deep Learning and Reinforcement principles
    in a simplistic manner 🙌👍

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @hilbertcontainer3034
    @hilbertcontainer3034 Рік тому +4

    ~wow my favorite area about AI =]
    cant wait to finish the lecture

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @esthertschache
    @esthertschache 6 місяців тому

    Great video!

  • @xzuanaja2746
    @xzuanaja2746 11 місяців тому +12

    This is so great! but unfortunately due to my limited English, I didn't understand some parts. Hopefully in the future there will be subtitles in Indonesian or other languages, thank you very much!

    • @master7738
      @master7738 5 місяців тому +1

      you can use subtitles if you want

  • @khalidalsaleh3858
    @khalidalsaleh3858 Рік тому +1

    Thanks!

  • @prithvishah2618
    @prithvishah2618 Рік тому +2

    Thank you so much :)

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @jennifergo2024
    @jennifergo2024 5 місяців тому

    Thanks for sharing!

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @seanwalsh358
    @seanwalsh358 8 місяців тому +1

    Great lecture from a great instructor.

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @blas.duarte
    @blas.duarte Рік тому +1

    Great!

  • @saprogrammer2702
    @saprogrammer2702 9 місяців тому

    Dude, this guy did such a good job!!!!

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @vahidg1500
    @vahidg1500 8 місяців тому

    Thank You, Ostad Amini, But how can I find some code examples for policy learning like ppo?

  • @sapienspace8814
    @sapienspace8814 Рік тому +2

    @ 50:00 Very impressive work, VISTA!

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @kritsaphongphuthibpaphaisi1509

    Great lecture

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @agenticmark
    @agenticmark 3 місяці тому

    Glad to see ML can figure out what I did as an 8 year old with a stack of quarters :D

  • @MrMonkeyMana
    @MrMonkeyMana 11 місяців тому +3

    Can you teach AI to play City Skylines.

  • @MrPejotah
    @MrPejotah Рік тому +2

    Once again a great lecture. I have a challenge, and I wonder if you can help me. I'm currently implementing a NN to determine customer satisfaction through a set of inputs that translate behavioural patterns (think # of complaints with our customer service, rate of usage of our services, etc.), and I'd like to know how much each input i'm using contributes to the overall satisfaction score. I imagine this would involve performing the gradient of the output node (a single one in this case), to each input. Is there any lecture where you go into the details of this, both the math and tensorflow code? Thanks in advance!

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @smftrsddvjiou6443
    @smftrsddvjiou6443 5 місяців тому

    I recommend Barto Sutton „Reinforcement Learning“, 1st Edition, way,way better than the newer 2nd Edition.

  • @stef154
    @stef154 11 місяців тому +1

    Your videos on this channel are much appreciated they help me so much since I am teaching everything to my self. I am currently teaching myself DQN and I am stuck on this one question.
    The only thing I don't understand is how does the Q network not converge to the incorrect target since the target network is being updated much slower. I understand that the target network is being updated slower to be more stable but wouldn't the Q network just converge to the incorrect target because the network that is providing the estimate for future values (target network) is being updated slower so it will be in accurate for longer ?
    Your help would be much appreciated! Thx again for the awesome videos !

    • @Jerryfan271
      @Jerryfan271 7 місяців тому

      for a longer series of lectures, you could try David Silver's. That one has some more information about policy value iteration, if that is what you are talking about. (in Q learning, there is only one network. not sure what you mean by the target network here).

    • @smftrsddvjiou6443
      @smftrsddvjiou6443 5 місяців тому

      By chosing with nonzero probability a non optimal actions. To explore different non optimal strategies, which might be not optimal yet but not yet explored, like a new variant line in a chess game. It is called exploitation- exploration dilemma or problem. So far as I know, the target network is not updated (besides of the discount term, which has a Q value)

  • @herikaniugu
    @herikaniugu 7 місяців тому

    RL is so good for optimizing the trading strategies

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @jiunyen5586
    @jiunyen5586 Рік тому

    Thanks for the thorough vid! I'm a bit lost @ 39:31 on where the "-0.8" velocity come from. The closest I'm trying to interpret is given the mean=-1 and var=0.5 the prob of norm dist at mean would be about 0.8... and since your going the negative direction to action a, then it becomes -0.8 ?? But this interpretation seems wrong since the mean should indicate the direction and velocity of action a, while the prob is for computing the loss. So.... what am I missing here? Thanks!

    • @gnikhil335
      @gnikhil335 Рік тому

      when you say " the prob of normal distribution at mean would be around 0.8" where did you get 0.8 from ? (the maximum value of this distribution is 0.564 at mean ) and secondly I think he is using 0.8 m/s as an example ( its a random value which you might get after mapping it back to a speed variable in your game )

    • @jiunyen5586
      @jiunyen5586 Рік тому

      @@gnikhil335 Good call! I misused that variance for std. My mistake. And I also really should've said likelihood there. But yeah, really I was just trying to figure out why he said the mean is centered at -0.8 but also shows a mean of -1 for the predicted params of pdf. As in are they just separate random examples or are we using a pdf with mean=-1, var=0.5 to determine the prob when speed is -0.8, which also doesn't seem likely since I thought we would use the velocity with the max likelihood (i.e. mean).

  • @madhusudhanreddy9157
    @madhusudhanreddy9157 Рік тому

    Hi Alex,
    Could you please suggest any best platform(online coding) that works properly for Reinforcement Learning, In our local systems, are getting errors(system dependencies).
    Even google colab is showing error when using gym library
    Thanks
    Your UA-cam Follower

    • @xpcalc446
      @xpcalc446 Рік тому

      Have you try to solve those errors by installing the the correct version of the packages?

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому +1

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @Gabcikovo
    @Gabcikovo Рік тому +1

    54:38

  • @forheuristiclifeksh7836
    @forheuristiclifeksh7836 Місяць тому +1

    7:00

  • @ojasvisingh786
    @ojasvisingh786 Рік тому +3

    👏👏

  • @DonReichSdeDios
    @DonReichSdeDios 6 місяців тому

    An apple with a byte❤
    ✒️ fellow August 13th🤳🏿

  • @Achielezz
    @Achielezz 4 місяці тому

    You say state-action-pear but show an apple, I AM CONFUSION! AMERICA EXPRAIN! :) Loved the lecture, really well done.

  • @forheuristiclifeksh7836
    @forheuristiclifeksh7836 Місяць тому +1

    14:25

  • @user-xd4cl3qd8x
    @user-xd4cl3qd8x 10 місяців тому +7

    Oh my God, he is so Handsome. And your spoken, lecture delivery, and fluency in RL in as awesome as your looks are....🤩 focusing on the speaker more than the slides. May Allah Almighty bless you man

    • @bohanwang-nt7qz
      @bohanwang-nt7qz 4 місяці тому

      Hey, I'd like to introduce you to my AI learning tool, Coursnap, designed for youtube courses! It provides course outlines and shorts, allowing you to grasp the essence of 1-hour in just 5 minutes. Give it a try and supercharge your learning efficiency!

  • @SantoshKumar-hx2ig
    @SantoshKumar-hx2ig Рік тому

    Lecture 7 ?

    • @AAmini
      @AAmini  Рік тому

      Lecture 7 is having some technical difficulties so it will be published tomorrow same time (10am ET) -- sorry for the delay!

    • @SantoshKumar-hx2ig
      @SantoshKumar-hx2ig Рік тому +1

      @@AAmini I am very happy for reply within few minutes.
      Today I feel the power of mit .

    • @AAmini
      @AAmini  Рік тому

      Thank you for your understanding :)

  • @smftrsddvjiou6443
    @smftrsddvjiou6443 5 місяців тому

    Now, he knows that Q values can be converted into Probability?

  • @roadto300kusdbtc7
    @roadto300kusdbtc7 Рік тому

    once again, audio is super quiet. Had to turn the volume to 100. Fire the audio guy lol

  • @davidkamran9092
    @davidkamran9092 10 місяців тому

    SEALCLATCONTITOIN - YALL NEED TO INCORPORATE HARD-CODED TRAJETORIES LIKE POLITICAL VIEWS IN DEEP LEARNING .. THE SYSTEM DYNAMICS CHANGE BASED ON POLITICAL MODALITIES

  • @shojintam4206
    @shojintam4206 10 місяців тому

    33:13

  • @pravachanpatra4012
    @pravachanpatra4012 10 місяців тому

    16:03