RL Course by David Silver - Lecture 1: Introduction to Reinforcement Learning

Поділитися
Вставка
  • Опубліковано 12 тра 2015
  • #Reinforcement Learning Course by David Silver# Lecture 1: Introduction to Reinforcement Learning
    #Slides and more info about the course: goo.gl/vUiyjq

КОМЕНТАРІ • 324

  • @jordanburgess
    @jordanburgess 8 років тому +1269

    Just finished lecture 10 and I've come back to write a review for anyone starting. *Excellent course*. Well paced, enough examples to provide a good intuition, and taught by someone who's leading the field in applying RL to games. Thank you David and Karolina for sharing these online.

    • @Gabahulk
      @Gabahulk 8 років тому +22

      I've finished both of them, and I'd say that this one has a better and much more solid content, although the one from udacity is much more light and easy to follow, so it really depends on what you want :)

    • @adarshmcool
      @adarshmcool 7 років тому +17

      This course is more thorough and for someone who is looking to make a career in Machine Learning, you should put in the work and do this course.

    • @TheAdithya1991
      @TheAdithya1991 7 років тому +6

      Thanks for the review!

    • @devonk298
      @devonk298 7 років тому +6

      One of the best , if not the best , courses I've watched!

    • @saltcheese
      @saltcheese 7 років тому +3

      thanks for the review

  • @tanmaygangwani3534
    @tanmaygangwani3534 7 років тому +52

    The complete set of 10 lectures is brilliant. David's an excellent teacher. Highly recommended!

  • @nguyenduy-sb4ue
    @nguyenduy-sb4ue 4 роки тому +156

    how lucky we are to have access to this kind of knowledge only with a button ! Thank you all in DeepMind public this course

    • @BhuwanBhatta
      @BhuwanBhatta 4 роки тому +5

      I was going to say the same. Technology has really made our life easier and better in a lot of ways. But a lot of times we take it for granted.

    • @sachinkalwar4359
      @sachinkalwar4359 3 роки тому

      @@BhuwanBhatta fvy5tym 🎉4ufgc🙏😎4g🔥f9f4c6v f😎j 9c

    • @anniekhoekzema9344
      @anniekhoekzema9344 3 роки тому

      @@BhuwanBhatta ji kghkfktghjkhhiljcujfjpjkui jikskjgjpj

  • @zingg7203
    @zingg7203 7 років тому +447

    0:01 Outline
    Admin 1:10
    About Reinforcement Learning 6:13
    The Reinforcement Learning problem 22:00
    Inside an RL angent 57:00
    Problems within Reinforcement Learning

    • @user-sf5ig4sz6p
      @user-sf5ig4sz6p 7 років тому +1

      Good job. Very thankful :)

    • @enochsit
      @enochsit 6 років тому +1

      thanks

    • @trdngy8230
      @trdngy8230 6 років тому +2

      You made the world much easier! Thanks!

    • @michaelc2406
      @michaelc2406 6 років тому +6

      Problems within Reinforcement Learning 1:15:53

    • @mairajamil001
      @mairajamil001 3 роки тому

      Thank you for this.

  • @eyeofhorus1301
    @eyeofhorus1301 5 років тому +31

    Just finished lecture 1 and can already tell this is going to be one of the absolute best courses 👌

  • @socat9311
    @socat9311 5 років тому +21

    I am a simple man. I see a great course, I press like

  • @passerby4278
    @passerby4278 4 роки тому +21

    what a wonderful time to be alive!!
    thank god we have the opportunity to study a full module from one of the best unis in the world. taught by one of the leaders of its field

  • @guupser
    @guupser 6 років тому +16

    Thank you so much for repeating the questions each time.

  • @Edin12n
    @Edin12n 4 роки тому +5

    That was brilliant. Really helping me to get my head around the subject. Thanks David

  • @DrTune
    @DrTune Рік тому

    Excellent moment around 24:10 when David makes it crystal clear that there needs to be a metric to train by (better/worse) and that it's possible - and necessary - to try to come up with a scalar metric that roughly approximates success or failure in a field. When you train something to optimize for a metric, important to be clear up-front what that metric is.

  • @AndreiMuntean0
    @AndreiMuntean0 8 років тому +38

    The lecturer is great!

  • @tylersnard
    @tylersnard 4 роки тому +33

    I love that David is one of the foremost minds in Reinforcement Learning, but he can explain it in ways that even a novice can understand.

    • @5m5tj5wg
      @5m5tj5wg 3 місяці тому +1

      Would be weird if he couldn't. If an expert can't explain it to a novice who can.

  • @zhongchuxiong
    @zhongchuxiong Рік тому +9

    1:10 Admin
    6:13 About Reinforcement Learning
    6:22 Sits in the intersection of many fields of science: solving decision making problem in these fields.
    9:10 Branches of machine learning.
    9:37 Characteristics of RL: no correct answer, delayed feedback, sequence matters, agent influences environment.
    12:30 Example of RL
    21:57 The Reinforcement Learning Problem
    22:57 Reward
    27:53 Sequential Decision Making. Action
    29:36 Agent & Environment. Observation
    33:52 History & State: stream of actions, observations & rewards.
    37:13 Environment state
    40:35 Agent State
    42:00 Information State (Markov State). Contains all useful information from history.
    51:13 Fully observable environment
    52:26 Partially observable environment
    57:04 Inside an RL Agent
    58:42 Policy
    59:51 Value Function: prediction of the expected future reward.
    1:06:29 Model: transition model, reward model.
    1:08:02 Maze example to explain these 3 key components.
    1:10:53 Taxonomy of RL agents based on these 3 key components:
    policy-based, value-based, actor critic (which combines both policy & values function), model-free, model-based
    1:15:52 Problems within Reinforcement Learning.
    1:16:14 Learning vs. Planning. partial known environment vs. fully known environment.
    1:20:38 Exploration vs. Exploitation.
    1:24:25 Prediction vs. Control.
    1:26:42 Course Overview

  • @sachinramsuran7372
    @sachinramsuran7372 4 роки тому +1

    Great lecture. The examples really helped in understanding the concepts.

  • @NganVu
    @NganVu 3 роки тому +20

    1:10 Admin
    6:13 About Reinforcement Learning
    21:57 The Reinforcement Learning Problem
    57:04 Inside an RL Agent
    1:15:52 Problems within Reinforcement Learning

  • @alpsahin4340
    @alpsahin4340 4 роки тому

    Great lecture, great starting point. Helped me to understand the basics of Reinforcement Learning. Thanks for great content.

  • @Esaens
    @Esaens 3 роки тому

    Superb David - you are one of the giants I am standing on to see a little further - thank you

  • @dhrumilbarot1431
    @dhrumilbarot1431 6 років тому

    Thank you for sharing.It kinda inspires me to always remember that I have to pass it on too.

  • @tianmingdu8022
    @tianmingdu8022 7 років тому

    The UCL lecturer is awesome. Thx for the excellent course.

  • @asavu
    @asavu Рік тому

    David is awesome at explaining a complex topic!

  • @vipulsharma3846
    @vipulsharma3846 4 роки тому +1

    I am taking a Deep Learning course rn but seriously the comments here are motivating me to get into this one right away.

  • @ImtithalSaeed
    @ImtithalSaeed 6 років тому +79

    I can say that I 've found a treasure..really

  • @nirajabcd
    @nirajabcd 3 роки тому

    Just completed Coursera's Reinforcement Learning Specialization and this is a nice addition to reinforce the concept I am learning.

  • @kiuhnmmnhuik2627
    @kiuhnmmnhuik2627 7 років тому +2

    @1:07:00. Instead of defining P_{ss'}^a and R_s^a, it's better to define p(s',r|s,a), which gives the joint probability of the new state and reward. The latter is the approach followed by the 2nd edition of Sutton&Barto's book.

  • @hassan-ali-
    @hassan-ali- 7 років тому +19

    lecture starts at 6:30

  • @ShalabhBhatnagar-vn4he
    @ShalabhBhatnagar-vn4he 4 роки тому +5

    Mr. Silver covers in 90 minutes what most books do not in 99 pages. Cheers and thanks!

  • @linglingfan8138
    @linglingfan8138 3 роки тому +1

    This is really the best RL course I have seen!

  • @kozzuli
    @kozzuli 7 років тому +1

    Ty for sharing, Great Lecture!!

  • @wireghost897
    @wireghost897 Рік тому

    It's really nice that he gives examples.

  • @filippomiatto1289
    @filippomiatto1289 6 років тому +1

    Amazing video, a very well-designed and well-delivered lecture! I'm going to enjoy this course, good job! 👍

  • @ethanlyon8824
    @ethanlyon8824 7 років тому +33

    Wow, this is incredible. I'm currently going through Udacity and this lecture series blows their material from GT out of the water. Excellent examples, great explanation of theory, just wow. This actually helped me understand RL. THANK YOU!!!!!

    • @JousefM
      @JousefM 4 роки тому

      How do you find the RL course from Udacity? Thinking about doing it after the DL Nanodegree.

    • @pratikd5882
      @pratikd5882 3 роки тому +4

      @@JousefM I agree, those explanations by GT professors were confusing and less clear, the entire DS nanodegree which had ML, DL and RL was painful to watch and understand.

  • @Abhi-wl5yt
    @Abhi-wl5yt 2 роки тому

    I just finished the course, and the people in this comment section are not exaggerating. This is one of the best courses on Reinforcement learning. Thank you very much DeepMind, for making this free and available to everyone!

  • @sng5192
    @sng5192 7 років тому +1

    Thanks for a great lecture. I got grasp the point of reinforcement learning !

  • @donamincorleone
    @donamincorleone 8 років тому +6

    Great video. Thanks. I really needed something like this :)

  • @tristanlouthrobins
    @tristanlouthrobins 3 місяці тому

    This is one of the clearest and most illuminating introductions I've watched on RL and its practical applications. Really looking forward to the following instalments.

  • @aam1819
    @aam1819 5 місяців тому

    Thank you for sharing your knowledge online. Enjoying your videos, and loving every minute of it.

  • @MGO2012
    @MGO2012 7 років тому

    Excellent explanation. Thank you.

  • @deviljin6217
    @deviljin6217 Рік тому

    the legend of all RL courses

  • @user-hb9wc7sx9h
    @user-hb9wc7sx9h 8 місяців тому +2

    David is awesome at explaining a complex topic!. Great lecture. The examples really helped in understanding the concepts..

  • @lcswillems
    @lcswillems 6 років тому

    A really good introduction course!! Thank you very much!!

  • @mo3adhaytam771
    @mo3adhaytam771 Рік тому

    I took this playlist as a reference for my thesis in "RL for green radio".

  • @dalcimar
    @dalcimar 5 років тому +26

    Can you enable the automatic captioning to this content?

  • @43SunSon
    @43SunSon 2 місяці тому +1

    Im back again, watching the whole video again.

  • @zhichaochen7732
    @zhichaochen7732 7 років тому

    RL could be the killer app in ML. Nice lectures to bring people up to speed!

  • @umountable
    @umountable 6 років тому

    46:20 this also means that it doesn't matter how you got into this state, it will always mean the same.

  • @VishalKumarTech
    @VishalKumarTech 6 років тому

    Thank you David!!

  • @HazemAzim
    @HazemAzim 2 роки тому

    just amazing and different than any intro to RL

  • @johntanchongmin
    @johntanchongmin 3 роки тому +3

    Really love this video series. Watching it for the fifth time:)

  • @ABHINAVGANDHI09
    @ABHINAVGANDHI09 5 років тому

    Thanks for the question at 19:48!

  • @yuwuxiong1165
    @yuwuxiong1165 3 роки тому

    Take swimming as example: learning is part that you directly jump into the water and learn swimming to survive; planning is that part that before jumping into the water, you read books/instructions on how to swim (obviously sometimes planning helps, sometimes not, sometimes counter-helps).

  • @rossheaton7383
    @rossheaton7383 5 років тому +5

    Silver is a boss.

  • @bennog8902
    @bennog8902 6 років тому +1

    awesome course and awesome teacher

  • @lauriehartley9808
    @lauriehartley9808 4 роки тому +1

    I have never heard a punishment described as a negative reward at any point during my 71 orbits of the Sun. You can indeed learn something new every day.

  • @Newascap
    @Newascap 3 роки тому +3

    I actually prefer this 2015 class over the most recent 2019 one. Nothing wrong on the other expositor, but David kinda makes the course more smoothly.

  • @ProfessionalTycoons
    @ProfessionalTycoons 6 років тому

    amazing introduction and very cool

  • @vorushin
    @vorushin 3 місяці тому

    Thanks a lot for the great lectures! I enjoyed watching every one of them (even #7). This is a great complement to reading Sutton/Barto and the seminal papers in RL.
    I remember looking at the Atari paper in the late 2013 and having hard time to understand why everyone is going completely crazy about it. A few years later the trend was absolutely clear. Reinforcement Learning is the key to push the performance of AI systems past the threshold where the humans can serve as wise supervisors to the limit when the different kinds of intelligence help each other to improve via self-play.

  • @AwesomeLemur
    @AwesomeLemur 3 роки тому

    We can't thank you enough!

  • @vballworldcom
    @vballworldcom 5 років тому +1

    Captions would really help here!

  • @abunickabhi
    @abunickabhi 6 років тому +1

    Excellent Indeed!

  • @yuxinzhang9403
    @yuxinzhang9403 2 роки тому

    Any observation and reward could be wrapped up into abstract data structure in an object for sorting.

  • @HoangPham-oh6re
    @HoangPham-oh6re 6 років тому +9

    Could you please turn on the auto generated subscript?

  • @Delta19G
    @Delta19G 7 місяців тому

    This is my first taste of deep mind

  • @AhmedThabit99
    @AhmedThabit99 4 роки тому +5

    if you can activate the subtitle from youtube, it will be great, Thanks

  • @43SunSon
    @43SunSon 3 роки тому +21

    I have to admit, david silver is slightly smarter than me.

  • @merajis
    @merajis 6 років тому

    I love this!

  • @aaronvr_
    @aaronvr_ 4 роки тому +2

    really high quality, I'm impressed at David Silver's (or somebody else's?) choice to offer this content to the general public free of charge.. what an age we're living in :DDDDDDDDDDD

  • @rz4413
    @rz4413 5 років тому

    brilliant course

  • @dashingrahulable
    @dashingrahulable 7 років тому

    On Slide "History and State" @ 34:34, does the order of Actions, Observations and Rewards matter? If yes, then why the order isn't Observations, Rewards and Actions; the reasoning is that the agent sees the observations first, assesses the reward for actions and then takes a particular action? Please clarify if the chain-of-thought went awry at any place.
    Thanks.

  • @mgonetwo
    @mgonetwo Рік тому +1

    Rare opportunity to listen to Christian Bale after he is finished with dealing with criminals as Batman.
    On a serious note, overall great series of lectures! Thanks, prof. David Silver!

  • @legorative
    @legorative 6 років тому

    Too good :) Best analogies.

  • @TheAIEpiphany
    @TheAIEpiphany 3 роки тому +1

    His name should be David Gold or Platinum I dunno. Best intro to RL on YT, thank you!

  • @erichuang2009
    @erichuang2009 4 роки тому +4

    5 days to train per game. now is 5 minutes to complete a train based on recent papers. envolve fast!

  • @donnysoh5610
    @donnysoh5610 4 роки тому

    Hi, am the example of the mouse pressing the lever, would that mean that the representation of the agent state will determine how well the agent learns?

  • @viscaelbarca4381
    @viscaelbarca4381 2 роки тому +4

    Would be great if you guys could add subtitles!

  • @vovos00
    @vovos00 7 років тому

    Thank you for nice lecture

  • @ajibolashodipo8911
    @ajibolashodipo8911 2 роки тому

    Silver is Gold!

  • @bingeltube
    @bingeltube 5 років тому +1

    Recommendable

  • @rohitsaka
    @rohitsaka 3 роки тому +6

    For Me : David Silver is God ❤️ What a Man ! What an Explanation. One of the Greatest Minds who changed the Dynamics of RL in the past few years.Thanks Deep mind for uploading this Valuable course for free 🤍

  • @florentinrieger5306
    @florentinrieger5306 9 місяців тому

    This is so good!

  • @AlessandroOrlandi83
    @AlessandroOrlandi83 3 роки тому

    Amazing teacher I wish I could partecipate to this course! I did a course on Coursera but it was so quick to explain very complex things.

    • @pratikd5882
      @pratikd5882 3 роки тому

      Are you referring to the RL specialization by Alberta university? If so, then how good was it on the programming/practical aspects?

    • @AlessandroOrlandi83
      @AlessandroOrlandi83 3 роки тому

      @@pratikd5882 Yes, I did that. The exercises were good, but I'm not an AI guy but a simple programmer. I managed to do the exercises but I think that explainations were very concise. So in 15 minutes they explain what you get in 1 hour on those lectures. I think that is very summarized. But it's good they have exercises. So I don't think after doing that I'm actually able to do much

    • @satishrapol3650
      @satishrapol3650 2 роки тому

      Do you have any suggestions about which one to start with , the Lecture series here or the RL specialization by Alberta University (on Coursera). I need to apply RL on my own project work. By the way I did the course on Machine learning by NG Andrews and I could follow the pace it was good enough for me and besides the programming exercises helped me alot than I could imagine. But I am not sure if so would be the case with RL by Coursera as well. Can you guide me on this?

  • @saranggawane4719
    @saranggawane4719 2 роки тому

    42:00 - 47:55 : Information State/Markov State
    57:13 RL Agent

  • @jamesr141
    @jamesr141 2 роки тому

    What a GIFT.

  • @fandrade9
    @fandrade9 3 роки тому

    ¡Great lecture!

  • @shivramshetty5502
    @shivramshetty5502 6 років тому

    Hi David,
    Thank you very much for such great video course. I would be very much appreciated if you help clear the following:
    1. Is there a need for training by playing (hundreds of time) say the game as an example before being ready? if you have generalized program designed for reinforcement learning is it ready to play a new game who rules are different or need to be trained again.
    2. There is mathematical proof that intermediate rewards can be calculated based on past experience - there is no need to know the intermediate rewards,. It can be derived say using deep mind Q function.- please correct - thank you.
    3. In helicopter example current state depends on the past states like velocity acceleration - so not Markov
    4. Can the reinforcement learning be realistically used to drive a car or run robot using generalized program that has inputs setup appropriately?
    Thank you,
    Sam

    • @edmonddantes4705
      @edmonddantes4705 Рік тому

      1 Hundreds of times is absolutely nothing in ML or RL. In order to learn to play games with a rich state space like go or chess, AlphaGo or AlphaZero can play billions of games. Also, by "generalised program" you mean architecture or algorithm, like algorithms based on Q-learning, concrete policy gradient methods, etc. If the rules are different, you can use the same algorithm if you reckon it is suitable, but you would have to train from scratch (unless there is a lot of similarity between the games and you can reuse some weights, as one can do with word embeddings in diverse NLP problems).
      2 You are not being rigorous about the statement of your question, so it is very hard to understand what you exactly mean. It absolutely necessary to measure intermediate rewards.
      3 How is the helicopter state not Markov? The helicopter movement is simulated by running a controlled ODE, and a numerical solver for an ODE is Markov by definition. Of course it is Markov.
      4 Self-driving cars are an example of that.

  • @user-xh7fx6yv6q
    @user-xh7fx6yv6q 4 роки тому

    Thanks for great lecture :))
    36:36
    Question! So the state S can be any function of the history ~ f(H), including rewards for example? I mean, can we define a state with observations 'and' rewards ? Is it a usual case?

    • @parthpurvesh1201
      @parthpurvesh1201 3 роки тому

      Umm yes. There can be a lot of ways of defining a function, and it may or may not include evaluating the reward. See, if the reward is on a linear axis, so there is a definite score, there is no need to evaluate the function on the reward. But when the reward is more probabilistic, you do use the reward and observation as parameters of the function.

  • @AntrianiStylianou
    @AntrianiStylianou 2 роки тому +2

    anyone can confirm if this is still relevant in 2022? I would like to study RL. It seems that there is a more recent series but with a different professor on this channel.

  • @razzlefrog
    @razzlefrog 8 років тому

    Only slide that threw me off a bit was the RL taxonomy one. There was some confusion with the redundant labeling, otherwise it was a great lecture!

  • @RahulSharma-yx5uf
    @RahulSharma-yx5uf 2 роки тому

    Thank you very much!!

  • @Paul-rs4gd
    @Paul-rs4gd 7 років тому +3

    Great course thanks.
    I am confused about an issue. When a neural net is trained to implement Q(s, a), there is no explicit list of states. The states are vectors fed into the net and the exact same state is unlikely to occur twice in a complex problem. So the net learns to estimate the value of Q, from the rewards which are sampled from trajectories, and will generalise between similar states. Here is my issue: The sampled rewards will have a high variance. If we train the net on an example [s1, a, r1] and then on [s2, a, r2], where s1 and s2 are 'similar' states and r1 and r2 are very different rewards , will the net consider this as variance in the reward of the 'same' state and learn the average reward (expectation) or will the net take this as two distinct states with different values of Q and learn to discriminate the states. I think backpropagation would cause a strong weight to be given to the small difference between the states, thus enabling them to return very different output values.
    I'll hypothesise an answer to my own question: I think backpropagation would learn to discriminate s1 and s2 ONLY if they recurred FREQUENTLY with the same small difference, delta i.e. always, approximately s2 = s1+delta. If r1

    • @edmonddantes4705
      @edmonddantes4705 Рік тому

      I am guessing you are considering the setting where S is a discrete space? Because if S is continuous, in the MDP setting you can prove that both the probability distribution and expected reward of the MDP at a state s_1, given a fixed policy pi, converge to those at s_2 when s_1 --> s_2 (where the convergence takes place typically in the n-dimensional Euclidean space), so sufficiently close states must have sufficiently close rewards. In the discrete case, when S is approximated continuously, the existence of structure in the state space is key. It is the same case in supervised learning in the sense that (1) the probability distribution of the inputs (let us call it p(x) as it is usually done in the literature) tends to be concentrated around a manifold of much lower dimensionality than the dimension of x (i.e. of the original input space), (2) nearby inputs correspond with nearby outputs since neural networks are continuous. If there is no structure in the problem or it occurs very regularly that close inputs correspond with very different outputs, it seems hopeless to try solve the problem with neural nets. Also, I feel your question about variance in the case we encounter very different rewards is not a question about RL anymore, since you could ask literally the same about a supervised learning problem. In a supervised learning problem, the theoretical regression function is the expectation of y (the ouputs) given x (the inputs). If you have enough training data you should converge towards the regression function. In the general case, you can always look at the bias-variance tradeoff formula, which is a decent formalisation of what you are talking about.

  • @wentingwang883
    @wentingwang883 9 місяців тому

    Thanks so much!

  • @andyyuan97
    @andyyuan97 8 років тому

    if subscript provided, then it shoud be perfect and classic~~

  • @vladimir0681
    @vladimir0681 4 роки тому

    Are there lectures or assignments for the other half of the class(kernel methods)?

  • @halefomkahsay2931
    @halefomkahsay2931 4 роки тому

    Great Help Thanks Man

  • @einemailadressenbesitzerei8816
    @einemailadressenbesitzerei8816 3 роки тому

    I want to discuss:
    "All goals can be described by the maximisation of expected cumulative reward"
    "Do you agree with this statement?"
    My thoughts why it could be controversy is that you can never specify the reward such as you will never have unexpected side effects/behaviour of the agent.
    Any other inputs/thoughts?

  • @dharambir_iitk
    @dharambir_iitk Рік тому

    love it!

  • @user-if3jd5cm3g
    @user-if3jd5cm3g 6 років тому

    Where could I get the corresponding caption of these videos?

  • @deepschoolai
    @deepschoolai 7 років тому +13

    Err have you disabled captions on this video?

  • @yvesmatanga2242
    @yvesmatanga2242 6 років тому

    Hi, Thank you very much for this enlightening video about reinforcement learning,
    However, I wanted to ask how can one tell that the step reward after a given move in games like chess where theoretically a good move will be assessed by the ability of the player to win the game ultimately.
    Moreover, can't we say that reinforcement learning is a special type of supervised learning because anyway we cannot get the optimal policy without the pair "action-reward"?

    • @saurabhjhanjee2408
      @saurabhjhanjee2408 5 років тому

      The state is described to be the information used to make a decision but that doesn't necessarily mean it properly represents the relevant characteristics of the environment to make a correct and consistent decision (ie the probabilities of transitioning to other states given the actions is not only dependent on the state). Hence, in reinforcement learning, a properly implemented state (one that gives us all the information required to make a decision) is s Markov but a state is not Markov by definition.

    • @edmonddantes4705
      @edmonddantes4705 Рік тому +1

      The immediate rewards in chess are always zero unless the move wins the game (i.e. unless the move is a checkmate). You are confusing value function with immediate reward. In chess, go, etc, the immediate rewards are deterministic and pretty simple, but the value function is very hard to obtain. If you follow the course carefully, all those questions are answered in detail.
      Reinforcement learning is not a type of supervised learning, although some RL algorithms convert the task into a supervised task (just a minority of methods actually, but see batch methods in lecture 6). Supervised learning starts with a training set of inputs and outputs and a loss function. It is not obvious how to convert an RL problem into those. Again, this question will be clarified to you if you watch the lectures.
      So basically all the stuff you are asking can be clarified if you just watch this very course. There is even a full lecture devoted to games.

  • @raymondchua280
    @raymondchua280 7 років тому +6

    Does this course talks about Q learning?

    • @peterliu8182
      @peterliu8182 7 років тому +3

      Raymond Chua yes, Q learning is an algorithm for temporal difference control. so you can just read the title and go to this chapter directly.

  • @lazini
    @lazini 4 роки тому +1

    Thanks very much. But I need Eng.subtitle. Could you change setting of this videos? :)