RL Course by David Silver - Lecture 1: Introduction to Reinforcement Learning

Поділитися
Вставка
  • Опубліковано 24 гру 2024

КОМЕНТАРІ • 330

  • @jordanburgess
    @jordanburgess 8 років тому +1357

    Just finished lecture 10 and I've come back to write a review for anyone starting. *Excellent course*. Well paced, enough examples to provide a good intuition, and taught by someone who's leading the field in applying RL to games. Thank you David and Karolina for sharing these online.

    • @Gabahulk
      @Gabahulk 8 років тому +23

      I've finished both of them, and I'd say that this one has a better and much more solid content, although the one from udacity is much more light and easy to follow, so it really depends on what you want :)

    • @adarshmcool
      @adarshmcool 8 років тому +17

      This course is more thorough and for someone who is looking to make a career in Machine Learning, you should put in the work and do this course.

    • @TheAdithya1991
      @TheAdithya1991 8 років тому +6

      Thanks for the review!

    • @devonk298
      @devonk298 8 років тому +6

      One of the best , if not the best , courses I've watched!

    • @saltcheese
      @saltcheese 8 років тому +3

      thanks for the review

  • @zhongchuxiong
    @zhongchuxiong 2 роки тому +17

    1:10 Admin
    6:13 About Reinforcement Learning
    6:22 Sits in the intersection of many fields of science: solving decision making problem in these fields.
    9:10 Branches of machine learning.
    9:37 Characteristics of RL: no correct answer, delayed feedback, sequence matters, agent influences environment.
    12:30 Example of RL
    21:57 The Reinforcement Learning Problem
    22:57 Reward
    27:53 Sequential Decision Making. Action
    29:36 Agent & Environment. Observation
    33:52 History & State: stream of actions, observations & rewards.
    37:13 Environment state
    40:35 Agent State
    42:00 Information State (Markov State). Contains all useful information from history.
    51:13 Fully observable environment
    52:26 Partially observable environment
    57:04 Inside an RL Agent
    58:42 Policy
    59:51 Value Function: prediction of the expected future reward.
    1:06:29 Model: transition model, reward model.
    1:08:02 Maze example to explain these 3 key components.
    1:10:53 Taxonomy of RL agents based on these 3 key components:
    policy-based, value-based, actor critic (which combines both policy & values function), model-free, model-based
    1:15:52 Problems within Reinforcement Learning.
    1:16:14 Learning vs. Planning. partial known environment vs. fully known environment.
    1:20:38 Exploration vs. Exploitation.
    1:24:25 Prediction vs. Control.
    1:26:42 Course Overview

  • @tylersnard
    @tylersnard 5 років тому +36

    I love that David is one of the foremost minds in Reinforcement Learning, but he can explain it in ways that even a novice can understand.

    • @DEVRAJ-np2og
      @DEVRAJ-np2og 5 місяців тому

      hlo, can u please suggeest roadmap for rl.

  • @passerby4278
    @passerby4278 4 роки тому +25

    what a wonderful time to be alive!!
    thank god we have the opportunity to study a full module from one of the best unis in the world. taught by one of the leaders of its field

  • @zingg7203
    @zingg7203 8 років тому +463

    0:01 Outline
    Admin 1:10
    About Reinforcement Learning 6:13
    The Reinforcement Learning problem 22:00
    Inside an RL angent 57:00
    Problems within Reinforcement Learning

    • @차정민-b1z
      @차정민-b1z 8 років тому +1

      Good job. Very thankful :)

    • @enochsit
      @enochsit 7 років тому +1

      thanks

    • @trdngy8230
      @trdngy8230 7 років тому +2

      You made the world much easier! Thanks!

    • @michaelc2406
      @michaelc2406 7 років тому +6

      Problems within Reinforcement Learning 1:15:53

    • @mairajamil001
      @mairajamil001 3 роки тому

      Thank you for this.

  • @tga3532
    @tga3532 8 років тому +59

    The complete set of 10 lectures is brilliant. David's an excellent teacher. Highly recommended!

  • @socat9311
    @socat9311 6 років тому +28

    I am a simple man. I see a great course, I press like

  • @Abhi-wl5yt
    @Abhi-wl5yt 2 роки тому +2

    I just finished the course, and the people in this comment section are not exaggerating. This is one of the best courses on Reinforcement learning. Thank you very much DeepMind, for making this free and available to everyone!

  • @NganVu
    @NganVu 4 роки тому +22

    1:10 Admin
    6:13 About Reinforcement Learning
    21:57 The Reinforcement Learning Problem
    57:04 Inside an RL Agent
    1:15:52 Problems within Reinforcement Learning

  • @eyeofhorus1301
    @eyeofhorus1301 6 років тому +33

    Just finished lecture 1 and can already tell this is going to be one of the absolute best courses 👌

  • @ShalabhBhatnagar-vn4he
    @ShalabhBhatnagar-vn4he 4 роки тому +6

    Mr. Silver covers in 90 minutes what most books do not in 99 pages. Cheers and thanks!

  • @vipulsharma3846
    @vipulsharma3846 5 років тому +2

    I am taking a Deep Learning course rn but seriously the comments here are motivating me to get into this one right away.

  • @nguyenduy-sb4ue
    @nguyenduy-sb4ue 5 років тому +170

    how lucky we are to have access to this kind of knowledge only with a button ! Thank you all in DeepMind public this course

    • @BhuwanBhatta
      @BhuwanBhatta 5 років тому +6

      I was going to say the same. Technology has really made our life easier and better in a lot of ways. But a lot of times we take it for granted.

    • @sachinkalwar4359
      @sachinkalwar4359 3 роки тому

      @@BhuwanBhatta fvy5tym 🎉4ufgc🙏😎4g🔥f9f4c6v f😎j 9c

    • @anniekhoekzema9344
      @anniekhoekzema9344 3 роки тому

      @@BhuwanBhatta ji kghkfktghjkhhiljcujfjpjkui jikskjgjpj

  • @DrTune
    @DrTune 2 роки тому +1

    Excellent moment around 24:10 when David makes it crystal clear that there needs to be a metric to train by (better/worse) and that it's possible - and necessary - to try to come up with a scalar metric that roughly approximates success or failure in a field. When you train something to optimize for a metric, important to be clear up-front what that metric is.

  • @tristanlouthrobins
    @tristanlouthrobins 11 місяців тому +1

    This is one of the clearest and most illuminating introductions I've watched on RL and its practical applications. Really looking forward to the following instalments.

  • @elichen
    @elichen 2 місяці тому +1

    I'm really appreciating the intuitive style of this course, as contrasted to the Stanford course.

  • @guupser
    @guupser 6 років тому +18

    Thank you so much for repeating the questions each time.

  • @deviljin6217
    @deviljin6217 Рік тому +2

    the legend of all RL courses

  • @ethanlyon8824
    @ethanlyon8824 7 років тому +36

    Wow, this is incredible. I'm currently going through Udacity and this lecture series blows their material from GT out of the water. Excellent examples, great explanation of theory, just wow. This actually helped me understand RL. THANK YOU!!!!!

    • @JousefM
      @JousefM 4 роки тому

      How do you find the RL course from Udacity? Thinking about doing it after the DL Nanodegree.

    • @pratikd5882
      @pratikd5882 4 роки тому +4

      @@JousefM I agree, those explanations by GT professors were confusing and less clear, the entire DS nanodegree which had ML, DL and RL was painful to watch and understand.

  • @JustinArmstrong-u5w
    @JustinArmstrong-u5w Рік тому +4

    David is awesome at explaining a complex topic!. Great lecture. The examples really helped in understanding the concepts..

  • @dbdg8405
    @dbdg8405 4 місяці тому +1

    This is a superb course on so many levels. Thank you

  • @yuwuxiong1165
    @yuwuxiong1165 4 роки тому +1

    Take swimming as example: learning is part that you directly jump into the water and learn swimming to survive; planning is that part that before jumping into the water, you read books/instructions on how to swim (obviously sometimes planning helps, sometimes not, sometimes counter-helps).

  • @lauriehartley9808
    @lauriehartley9808 4 роки тому +2

    I have never heard a punishment described as a negative reward at any point during my 71 orbits of the Sun. You can indeed learn something new every day.

  • @nirajabcd
    @nirajabcd 4 роки тому

    Just completed Coursera's Reinforcement Learning Specialization and this is a nice addition to reinforce the concept I am learning.

  • @linglingfan8138
    @linglingfan8138 3 роки тому +1

    This is really the best RL course I have seen!

  • @kiuhnmmnhuik2627
    @kiuhnmmnhuik2627 7 років тому +2

    @1:07:00. Instead of defining P_{ss'}^a and R_s^a, it's better to define p(s',r|s,a), which gives the joint probability of the new state and reward. The latter is the approach followed by the 2nd edition of Sutton&Barto's book.

  • @vorushin
    @vorushin 11 місяців тому

    Thanks a lot for the great lectures! I enjoyed watching every one of them (even #7). This is a great complement to reading Sutton/Barto and the seminal papers in RL.
    I remember looking at the Atari paper in the late 2013 and having hard time to understand why everyone is going completely crazy about it. A few years later the trend was absolutely clear. Reinforcement Learning is the key to push the performance of AI systems past the threshold where the humans can serve as wise supervisors to the limit when the different kinds of intelligence help each other to improve via self-play.

  • @TheAIEpiphany
    @TheAIEpiphany 3 роки тому +1

    His name should be David Gold or Platinum I dunno. Best intro to RL on YT, thank you!

  • @aam1819
    @aam1819 Рік тому

    Thank you for sharing your knowledge online. Enjoying your videos, and loving every minute of it.

  • @AndreiMuntean0
    @AndreiMuntean0 8 років тому +39

    The lecturer is great!

  • @hassan-ali-
    @hassan-ali- 8 років тому +20

    lecture starts at 6:30

  • @mdoIsm771
    @mdoIsm771 Рік тому

    I took this playlist as a reference for my thesis in "RL for green radio".

  • @Newascap
    @Newascap 4 роки тому +4

    I actually prefer this 2015 class over the most recent 2019 one. Nothing wrong on the other expositor, but David kinda makes the course more smoothly.

  • @johntanchongmin
    @johntanchongmin 4 роки тому +3

    Really love this video series. Watching it for the fifth time:)

  • @asavu
    @asavu 2 роки тому

    David is awesome at explaining a complex topic!

  • @tianmingdu8022
    @tianmingdu8022 8 років тому

    The UCL lecturer is awesome. Thx for the excellent course.

  • @wireghost897
    @wireghost897 Рік тому

    It's really nice that he gives examples.

  • @43SunSon
    @43SunSon 10 місяців тому +1

    Im back again, watching the whole video again.

  • @Dynamyalo
    @Dynamyalo 5 місяців тому +1

    Right now I am sitting in my pajamas in the comfort of my home, eating a peanut butter and jelly sandwich and I have the ability to watch an entire course about an advanced topic online for free. What a time to be alive

  • @Esaens
    @Esaens 4 роки тому

    Superb David - you are one of the giants I am standing on to see a little further - thank you

  • @mgonetwo
    @mgonetwo Рік тому +1

    Rare opportunity to listen to Christian Bale after he is finished with dealing with criminals as Batman.
    On a serious note, overall great series of lectures! Thanks, prof. David Silver!

  • @zhichaochen7732
    @zhichaochen7732 8 років тому

    RL could be the killer app in ML. Nice lectures to bring people up to speed!

  • @saranggawane4719
    @saranggawane4719 2 роки тому

    42:00 - 47:55 : Information State/Markov State
    57:13 RL Agent

  • @Edin12n
    @Edin12n 5 років тому +5

    That was brilliant. Really helping me to get my head around the subject. Thanks David

  • @ImtithalSaeed
    @ImtithalSaeed 6 років тому +80

    I can say that I 've found a treasure..really

  • @erichuang2009
    @erichuang2009 4 роки тому +4

    5 days to train per game. now is 5 minutes to complete a train based on recent papers. envolve fast!

  • @dalcimar
    @dalcimar 5 років тому +26

    Can you enable the automatic captioning to this content?

  • @SphereofTime
    @SphereofTime 4 місяці тому +1

    1:09:43 value function example in ternal

  • @43SunSon
    @43SunSon 4 роки тому +22

    I have to admit, david silver is slightly smarter than me.

  • @alpsahin4340
    @alpsahin4340 5 років тому

    Great lecture, great starting point. Helped me to understand the basics of Reinforcement Learning. Thanks for great content.

  • @daaaniel21
    @daaaniel21 7 років тому

    Thank you for sharing.It kinda inspires me to always remember that I have to pass it on too.

  • @rossheaton7383
    @rossheaton7383 6 років тому +5

    Silver is a boss.

  • @aj_shod
    @aj_shod 3 роки тому

    Silver is Gold!

  • @Delta19G
    @Delta19G Рік тому

    This is my first taste of deep mind

  • @sng5192
    @sng5192 8 років тому +1

    Thanks for a great lecture. I got grasp the point of reinforcement learning !

  • @sachinramsuran7372
    @sachinramsuran7372 5 років тому +1

    Great lecture. The examples really helped in understanding the concepts.

  • @rohitsaka
    @rohitsaka 4 роки тому +5

    For Me : David Silver is God ❤️ What a Man ! What an Explanation. One of the Greatest Minds who changed the Dynamics of RL in the past few years.Thanks Deep mind for uploading this Valuable course for free 🤍

  • @umountable
    @umountable 6 років тому

    46:20 this also means that it doesn't matter how you got into this state, it will always mean the same.

  • @yehu7944
    @yehu7944 7 років тому +70

    Could you please turn on the auto generated subscript?

    • @주동욱-l9j
      @주동욱-l9j 6 років тому +1

      Plz..

    • @Zebra745
      @Zebra745 6 років тому +9

      As a learner of reinforcement learning, you should become an agent and improve yourself with getting rewards in this environment

  • @HazemAzim
    @HazemAzim 3 роки тому

    just amazing and different than any intro to RL

  • @bennog8902
    @bennog8902 7 років тому +1

    awesome course and awesome teacher

  • @SphereofTime
    @SphereofTime 3 місяці тому +1

    1:08:02 transition reward model

  • @yuxinzhang9403
    @yuxinzhang9403 3 роки тому

    Any observation and reward could be wrapped up into abstract data structure in an object for sorting.

  • @iblaliftw
    @iblaliftw 2 роки тому

    Thank you very much, I recently got a good grade in RL thanks to your great teaching skills!!

  • @bocao3491
    @bocao3491 5 років тому

    Awesome! This is succinct and clarifies some concepts that I was confused of.

  • @mehershrishtinigam5449
    @mehershrishtinigam5449 Рік тому

    imp point at 1:00:30
    1:00:22 gamma's value is less than 1

  • @AlessandroOrlandi83
    @AlessandroOrlandi83 4 роки тому

    Amazing teacher I wish I could partecipate to this course! I did a course on Coursera but it was so quick to explain very complex things.

    • @pratikd5882
      @pratikd5882 4 роки тому

      Are you referring to the RL specialization by Alberta university? If so, then how good was it on the programming/practical aspects?

    • @AlessandroOrlandi83
      @AlessandroOrlandi83 4 роки тому

      @@pratikd5882 Yes, I did that. The exercises were good, but I'm not an AI guy but a simple programmer. I managed to do the exercises but I think that explainations were very concise. So in 15 minutes they explain what you get in 1 hour on those lectures. I think that is very summarized. But it's good they have exercises. So I don't think after doing that I'm actually able to do much

    • @satishrapol3650
      @satishrapol3650 2 роки тому

      Do you have any suggestions about which one to start with , the Lecture series here or the RL specialization by Alberta University (on Coursera). I need to apply RL on my own project work. By the way I did the course on Machine learning by NG Andrews and I could follow the pace it was good enough for me and besides the programming exercises helped me alot than I could imagine. But I am not sure if so would be the case with RL by Coursera as well. Can you guide me on this?

  • @taherhabib3180
    @taherhabib3180 3 роки тому

    His 2021 "Reward is Enough" paper makes us agree to the Reward Hypothesis @ 24:18 . :D

  • @AntrianiStylianou
    @AntrianiStylianou 3 роки тому +2

    anyone can confirm if this is still relevant in 2022? I would like to study RL. It seems that there is a more recent series but with a different professor on this channel.

  • @prashanthduvvuri7845
    @prashanthduvvuri7845 4 роки тому +2

    The future is independent of the past given the present
    - David Silver

    • @utsabshrestha277
      @utsabshrestha277 4 роки тому

      Only if it have Markov state

    • @prashanthduvvuri7845
      @prashanthduvvuri7845 4 роки тому

      The above comment was meant to be in the context of your life. Your brain is a cumulative of all your prior experiences and the choices/decisions which you make will be an a action taken by your brain(which is a markov state). So what I perceived from that statement was that, "you need to forget your past and move on".

  • @AhmedThabit99
    @AhmedThabit99 5 років тому +5

    if you can activate the subtitle from youtube, it will be great, Thanks

  • @jamesr141
    @jamesr141 3 роки тому

    What a GIFT.

  • @timothdev
    @timothdev 5 років тому

    Agent state is the internal representation of the agent, which is the summary of the past observations, used to decide on what action to make, given the observation from the environment.
    How then, at 51:19 the agent state = the observational state. I don't get this.
    Agent state is the internal representation isn't it. How it can be the input to the agent itself (observation state) ?

    • @dantealexis7835
      @dantealexis7835 5 років тому +1

      At 51:19 he is speaking of when the agent can fully observe all variables in the environment, and presumably does not need to alter the representation of these variables. In such a case the agent state IS literally the environment state.

  • @pp-1954
    @pp-1954 11 місяців тому

    I wonder Markov property is the same as Newtonian mechanics in physics which is deterministic like Markov states. Newtonian mechanics say that if you know every momentum and acceleration of every particles existent, you can calculate/know the future 100%.

    • @pp-1954
      @pp-1954 11 місяців тому

      Revision)
      If Markov state talks about the "instant" moment in time (delta t->0), I guess you cannot figure out the velocity or momentum so it is different from Newtonian.

  • @dashingrahulable
    @dashingrahulable 7 років тому

    On Slide "History and State" @ 34:34, does the order of Actions, Observations and Rewards matter? If yes, then why the order isn't Observations, Rewards and Actions; the reasoning is that the agent sees the observations first, assesses the reward for actions and then takes a particular action? Please clarify if the chain-of-thought went awry at any place.
    Thanks.

  • @filippomiatto1289
    @filippomiatto1289 7 років тому +1

    Amazing video, a very well-designed and well-delivered lecture! I'm going to enjoy this course, good job! 👍

  • @MimJim6784
    @MimJim6784 3 роки тому

    Please enable the auto subtitle generator!

  • @_jiwi2674
    @_jiwi2674 3 роки тому

    at 30:45, isn't the agent getting reward after taking action? It's not taking action based on the reward it receives

  • @MGO2012
    @MGO2012 7 років тому

    Excellent explanation. Thank you.

  • @life42theuniverse
    @life42theuniverse 2 роки тому

    The environment state(e,t) is Markov ... though it’s unknowable.

  • @lcswillems
    @lcswillems 7 років тому

    A really good introduction course!! Thank you very much!!

  • @donamincorleone
    @donamincorleone 9 років тому +6

    Great video. Thanks. I really needed something like this :)

  • @smilylife7515
    @smilylife7515 3 роки тому

    Please add subtitles to make it more helpful for those who are from non English native countries

  • @ABHINAVGANDHI09
    @ABHINAVGANDHI09 5 років тому

    Thanks for the question at 19:48!

  • @jy2883
    @jy2883 5 років тому +8

    Is it possible to add subtitles or autogenerated captions to these lecture videos?

  • @AwesomeLemur
    @AwesomeLemur 3 роки тому

    We can't thank you enough!

  • @kozzuli
    @kozzuli 8 років тому +1

    Ty for sharing, Great Lecture!!

  • @florentinrieger5306
    @florentinrieger5306 Рік тому

    This is so good!

  • @kushsharma7614
    @kushsharma7614 5 років тому

    Important Fact ----> By the time I am writing this there are around 700K views on Lecture 1 but only 35K on Lecture 10. So only a 0.05% chance that you'll watch all the lectures. So be patient and remember why are you starting. Thanks

  • @ayeshavlogsfun
    @ayeshavlogsfun 2 роки тому +1

    Are these lectures updated ?

  • @ProfessionalTycoons
    @ProfessionalTycoons 6 років тому

    amazing introduction and very cool

  • @robbyrayrab
    @robbyrayrab 3 роки тому +1

    what was the bit about 15 hrtz ?

  • @pratikkumarbulani8903
    @pratikkumarbulani8903 3 роки тому +1

    Are these slides available? If yes, please share the link to access those slides

  • @weiw1028
    @weiw1028 3 роки тому +1

    Begging for subtitles

  • @halibite
    @halibite 8 років тому

    1:09:26 I just don't quite understand the value of each state? How does this value come from? from human observation and experience?? But what if , think about the helicopter, the helicopter is in a totally new and never-seen-before environment? How does we get this initial state value?

    • @vincentfortin3225
      @vincentfortin3225 4 роки тому

      Random initialization. It is updated once it is visited.

  • @abhijeetghodgaonkar
    @abhijeetghodgaonkar 6 років тому +1

    Excellent Indeed!

  • @lazini
    @lazini 5 років тому +1

    Thanks very much. But I need Eng.subtitle. Could you change setting of this videos? :)

  • @deepschoolai
    @deepschoolai 7 років тому +13

    Err have you disabled captions on this video?

  • @abrahamyalley9973
    @abrahamyalley9973 Рік тому

    Now starting...want to know if there are any coding aspects in this course

  • @84xyzabc
    @84xyzabc 4 роки тому

    isn't at 43:33 H_{1:t} should be H_{t:1}? inline with far-right expression of H.