MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

Поділитися
Вставка
  • Опубліковано 29 гру 2024

КОМЕНТАРІ • 146

  • @lexfridman
    @lexfridman  6 років тому +346

    Deep RL is my favorite subfield of AI, because it asks some fundamental questions about what it takes to build safe and intelligent robots that operate in the real world. So many open problems and interesting challenges to solve!

    • @thesk8erdav
      @thesk8erdav 6 років тому +3

      we love you Lex!

    • @farhadsafaei1910
      @farhadsafaei1910 6 років тому +3

      It's my favorite one, too. Thanks for the lecture, I did enjoy a lot watching it.

    • @colouredlaundry1165
      @colouredlaundry1165 6 років тому +3

      With these lectures and interviews you are sharing and creating immense value: knowledge. Thank you!

    • @dklvch
      @dklvch 6 років тому +1

      Thank you Lex, awesome presentation!

    • @liuculiu8366
      @liuculiu8366 6 років тому +1

      love your spirit in sharing the latest information. appreciate!

  • @NakedSageAstrology
    @NakedSageAstrology 2 роки тому +56

    I wish you still did videos like this, we appreciate you sharing such knowledge.

  • @KeepingUp_withAI
    @KeepingUp_withAI 6 років тому +26

    Deep RL is the field that excites me the most. Thank you Lex.

  • @kawingchan
    @kawingchan 5 років тому +7

    I really like that tongue in cheek chuckle when Lex talked about that multiverse and whoever created it.....

  • @Techieadi
    @Techieadi 6 років тому +68

    Thank you for bringing these lectures to us.

  • @wendersonj
    @wendersonj 5 років тому +8

    Since 2017, Lex have improved his lessons spectacularly ! Now (2019), I watch a more fluid video with a feeling that this guy know exactly what his talking without hesitating . Once again, thanks Lex, for sharing this videos. Congratulations and thanks from Brazil.

  • @samuelschmidgall2090
    @samuelschmidgall2090 5 років тому +11

    Seriously the best Deep RL lecture out there to date.

  • @nova2577
    @nova2577 5 років тому +32

    "Every type of machine learning is supervised learning", cannot agree more!!!

    •  4 роки тому

      In fact, learning itself is a supervised process, otherwise it is acquiring not learning.

  • @akarshrastogi3682
    @akarshrastogi3682 5 років тому +9

    1:04:40 Best part, that grin after he just casually dropped that line in an MIT lecture.. All of infinite universes being Simulations

  • @sivaa6130
    @sivaa6130 6 років тому +6

    Every Lecture has a historical context, evolution, mathematics and inspiration, Technical overview, Network Architecture overview. Well Summarized!!

  • @ronaldolum464
    @ronaldolum464 8 місяців тому

    Certainly, one of the best videos on deep learning I have come across.

  • @akarshrastogi3682
    @akarshrastogi3682 5 років тому +12

    Professor Lex, can we get the entirety of 6.S091 on MIT OCW ? This is an incredibly interesting topic that I've been working on (Evolutionary Computing) and am currently enrolled in a project with thorough knowledge of Deep RL as a requisite. This research field has very few online resources besides Stanford's CS 234 and Berkeley's CS 285.
    Your explanations are immensely helpful and intuitive. Humanity will present it's gratitude if this whole course is made available ! AGI and AI safety issues need more attention before it's the greatest immediate existential risk, your courses can help raise general AI awareness and advance our civilization to higher dimensions. Loved the fact that you grinned while just casually mentioning the Simulation Hypothesis..

  • @danielvelazquez4472
    @danielvelazquez4472 6 років тому +11

    Haha he says "that is super exciting", without being excited! He is a robot!
    Thanks for the open lectures

  • @judedavis92
    @judedavis92 2 роки тому +1

    Loved the lecture. Definitely recommend his podcast. Quality.

  • @DennisZIyanChen
    @DennisZIyanChen 4 роки тому +1

    I honestly don't care about AlphaGo or Dota 2 or the robots, I just cannot get over how incredible the thought structure is behind this. What is mean by thought structure is the strategy behind how to quantify the right things, asking the right questions, and model the policy upon which growth can be created. IT IS SICK

  • @tarunpaparaju5382
    @tarunpaparaju5382 5 років тому +2

    I have tried to study and understand Deep RL using several books and lectures over the last few years, but I only feel like I understood something in RL after listening to this lecture. Thanks, Lex. I am grateful to you for posting this lecture on UA-cam. Thank you!

  • @amandajrmoore3216
    @amandajrmoore3216 2 роки тому

    As always Le a generous Share, which will be a useful resource for loads of folks. Thanks.

  • @MistaSmilesz
    @MistaSmilesz 5 років тому

    I've seen a lot of these videos & read some of the books in ML; Lex has a clarity thats rare

  • @charlesotieno6309
    @charlesotieno6309 5 років тому

    Thanks Lex !! Deep Reinforcement Learning opens up a new world..Life is not that complex like the baby in your video taking his first steps...unsupervised learning. Take into account the amount of time and effort(brains+USD) of getting an AI to do what the baby is doing..WALK in a few days and in the years to come -be a professor and continue with this subject
    The baby is the moral of the story....what we are doing is not working...we need a radical way of thinking...Your radical way is the way forward

  • @mrr5183
    @mrr5183 3 роки тому

    I appreciate the philosophical insights sprinkled throughout the lecture!

  • @ArghyaChatterjeeJony
    @ArghyaChatterjeeJony 5 років тому +2

    Lex Fridman, I just love your videos. I am your great fan sir. Carry on.

  • @vast634
    @vast634 4 роки тому +3

    Important detail when trying to transfer from a simulation to the real world: make the simulation have many random variations in its behavior/mechanics during runtime. (such as drag, gravity, friction, size of the agent, random perturbations, etc) This will make the agent have to generalize more, and not over optimize on the details in the sim. This makes it easier to transfer the agents capabilities to a real world environment.

    • @user-sc8ph2ds2m
      @user-sc8ph2ds2m 2 роки тому

      gravity is fake buddy ;)

    • @vast634
      @vast634 2 роки тому

      @@user-sc8ph2ds2m Take a brick, stand still, throw it straight up, then you can observe if gravity exists, or not. Very simple experiment to administer.

    • @user-sc8ph2ds2m
      @user-sc8ph2ds2m 2 роки тому

      @@vast634 you will experience buoyancy 🤦

  • @chinbold
    @chinbold 5 років тому

    I like his lecture because it's more understandable. And I also like his tones.

  • @learnbydoingwithsteven
    @learnbydoingwithsteven Місяць тому

    Back again to enjoy the lecture.

  • @Arghamaz
    @Arghamaz Рік тому

    This is interesting for me as this is my favorite Mathematics n Statistics combined Algebraic equations 🎉 MATHEMATICS is the Best Subject in World 🌎 👌 ❤🎉🎉

  • @merebhayl5826
    @merebhayl5826 2 роки тому

    I like how you quoted many theorems from Dostoevsky and also a few axioms from the Nietzsche's texts

    • @merebhayl5826
      @merebhayl5826 2 роки тому

      I had never seen Lex's lecture videos other than the philosophical podcasts. This is my first. And I just wrote the above comment as a joke without seeing the video and three minutes in, I found Socrates, Kant, Nietzsche... 😂😂 That's very Lex👌

  • @abdulrahmankerim2377
    @abdulrahmankerim2377 6 років тому +4

    One of the best lectures, I have ever watched ....Keep it up.

  • @neutrinocoffee1151
    @neutrinocoffee1151 6 років тому +6

    Loved this lecture. I learned a lot. Thank you.

  • @CarlosGutierrez-go9hq
    @CarlosGutierrez-go9hq Рік тому +1

    since i begin my journey of data science, machine learning, and AI I have been seeing patterns, I am the only one who see that is probably that we are just programs seeking for a never-ending end of this simulation, the way that q-learning is created is the most realistic comparison to human thought, so in order to maximize my output i have to reconsider my reward mechanism? (taking some info from huberman also).

  • @AviaEfrat
    @AviaEfrat 4 роки тому +1

    27:24 - There is no "reload" in Doom =)

  • @bryanbocao4906
    @bryanbocao4906 5 років тому +1

    It would be appreciated if anyone can have specific steps to get all the directions on the map from 18:51 to 21:32 in great detail.

  • @bayesianlee6447
    @bayesianlee6447 6 років тому +5

    Lex, I heard that DL professionals are now using the simulation which has nature based environment and using it to teach AI agent like making this agent to learn how to walk or run by itself.
    Yoshua bengio said next evolution will be based on simulation environment for AI.
    Would you have any idea or information to share with that?
    I really really appreciate all your works and spirit you have. All the world who have interests on AI really appreciate your work and sharing. Thank you ! :)

    • @borispyakillya4777
      @borispyakillya4777 5 років тому

      Do you mean smth like GYM-based simulations? Mujoco is based on physical laws - you can already train with RL methods

  • @stmandl
    @stmandl 5 років тому +2

    Hi Lex, thanks for this great lecture! Which books of Nietzsche did you have on your mind around 4:33?

  • @samlaf92
    @samlaf92 5 років тому

    @50:06 DQN can't learn stochastic policies. DQN has a softmax output on actions... isn't that a stochastic policy in itself?

  • @heinrichwonders8861
    @heinrichwonders8861 6 років тому +3

    I have been waiting for this.

  • @samferrer
    @samferrer 5 років тому

    Another detail I have noticed in many presentations ... those agents are not trying to model the environment ... that is semantically impossible ... what they are trying to do instead, I believe, is to model AN INSTANCE OF A DUAL SPACE associated to the environmental space. It is very common to use linear regressions for instance ...

    • @samferrer
      @samferrer 5 років тому

      Kevvy Kim hmmm ... we are saying the same thing ... it seems that practitioners and lectures keep it short without realizing perhaps the big conceptual gap is being created.

  • @sofina527
    @sofina527 11 місяців тому

    very helpful, thanks a lot dear prof.

  • @msamogh96
    @msamogh96 4 роки тому +2

    This guy is a better Siraj Raval.

  • @emilecureau
    @emilecureau 2 роки тому

    "when the reward flips, the optimal path is grad school, taking as long as possible and never reaching the destination....pffff" lol 21:20

  • @LidoList
    @LidoList 5 років тому

    Very good explanation of RL, thanks for the speaker !

  • @jonk.3947
    @jonk.3947 5 років тому

    Love the Digital Physics reference at 1:04:00 :)

  • @liberator328
    @liberator328 5 років тому +1

    Which Nietzsche book is he recommending at 4:12 ?

  • @hansharajsharma2765
    @hansharajsharma2765 5 років тому

    Love this. Thanks Lex.

  • @ruinsaneornot
    @ruinsaneornot 5 років тому +27

    30:30 "you know, MIT does better than Stanford that kind of thing" xD

  • @oldPrince22
    @oldPrince22 3 роки тому

    very good lecture! Thanks.

  • @noname76787
    @noname76787 2 роки тому

    thank you so much for the lecture!

  • @mrektor
    @mrektor 6 років тому +1

    Amazing work. Excelent lecture

  • @AbhishekKumar-mq1tt
    @AbhishekKumar-mq1tt 6 років тому +1

    Thank u for this awesome video

  • @Asmutiwari
    @Asmutiwari 4 роки тому +2

    Amazing lecture on DRL, can you also show us how can we implement Q function in Neural Network?

  • @datta97
    @datta97 4 роки тому

    Thanks for the last slide.

  • @benyaminewanganyahu
    @benyaminewanganyahu Рік тому

    This guy should do podcasting.

  • @Lorkin32
    @Lorkin32 5 років тому +3

    Much better than the Standford university lecture, where the lady basically only reads the equations without giving any real intuition to what's going on.

  • @junxu147
    @junxu147 3 роки тому

    Great lecture!

  • @eeee8677
    @eeee8677 5 років тому

    THANK YOU MIT

  • @konouzkartoumeh
    @konouzkartoumeh 5 років тому

    Great lecture! Thank you.

  • @Lunsterful
    @Lunsterful 6 років тому +1

    Excellent talk.

  • @jasonabc
    @jasonabc 5 років тому

    Really great lecture learned a lot

  • @yu-siangwang1818
    @yu-siangwang1818 6 років тому

    Great overview of DRL

  • @kaneelsenevirathne7085
    @kaneelsenevirathne7085 3 роки тому

    I took the engineering plasma class taught by your dad at Drexel :D

  • @OEFarredondo
    @OEFarredondo 5 років тому +2

    Remove the human factor. Have the traffic be free of human crossing

  • @benaliamima9903
    @benaliamima9903 3 роки тому

    Thank you for this amazing video. I want to know if i can use the DRL principe to enhance the QoS requirements in vehicular network??
    Any suggestions??

  • @alec1975
    @alec1975 2 роки тому

    very good intro

  • @stabgan
    @stabgan 5 років тому

    You are my idol lex

  • @kevinayers7144
    @kevinayers7144 3 роки тому

    Is the entire deep RL course available?

  • @kaiwang2924
    @kaiwang2924 5 років тому

    Wonderful lecture.

  • @el_lahw__el_khafi
    @el_lahw__el_khafi 2 місяці тому

    where are the rest of the lectures?

  • @sauravsingh9177
    @sauravsingh9177 2 роки тому

    check out - "Spinning up with Deep RL by openai"

  • @johnmacleod7789
    @johnmacleod7789 5 років тому

    Brilliant!!

  • @scorpion7434
    @scorpion7434 5 років тому +1

    The most funny part is where he was trying to explain the ability of human brains by evolution at 6:33 ! And he literally said, "it is some how being encoded" which contradicts the rewards concept he is introducing!
    Son, the most logical reason of having a predefined encoding scheme that never been trained, is the existence of a creator!

  • @abhaysap
    @abhaysap 5 років тому

    Can we take the idea's or clues from Biomimicry architecture in Reinforcement learning

  • @onwrdandupwrd5303
    @onwrdandupwrd5303 3 роки тому

    that DeepRL animation looks like something out of Bamzooki

  • @caizifeng
    @caizifeng 5 років тому

    great lecture

  • @putzz67767
    @putzz67767 5 років тому

    very good!!

  • @jeanjacqueslundi3502
    @jeanjacqueslundi3502 4 роки тому

    Are we really morally equipped to build AI that is safe and also built it for the right reasons.
    This is my problem with contporary science/techhnology... We dont focus on if we SHOULD do something. Just because its doable doesnt mean it should be made.

  • @inaamilahi5007
    @inaamilahi5007 3 роки тому

    Awesome

  • @deeplearningpartnership
    @deeplearningpartnership 6 років тому +2

    Nice.

  • @sarathrnair9499
    @sarathrnair9499 6 років тому +1

    Why no one is asking any doubts ? Or is that portions edited out? Nice lecture

  • @nisman.lo.desvivieron
    @nisman.lo.desvivieron Рік тому

    27:07 lex is scared of Doom

  • @Twgvlogs539
    @Twgvlogs539 5 років тому +1

    Super

  • @Lorkin32
    @Lorkin32 5 років тому +1

    How/why can you even upload this for free? Doesn't university cost loads in the US?
    Great stuff though!

    • @m3awna
      @m3awna 5 років тому

      I guess that's because MIT is focusing more on workshops/hands-on learning, AND to raise the barre for other universities/institutes... hhh

    • @petevenuti7355
      @petevenuti7355 2 роки тому

      But if a diploma is your goal , it sometimes helpful to sit in on a class before you take it for credit, can make it easier, but sometimes it just makes it boring and counterproductive the second time around.

    • @petevenuti7355
      @petevenuti7355 2 роки тому

      Sitting in doesn't get you credits or a diploma.

  • @rorylennon
    @rorylennon 2 роки тому

    Nice vijeo...

  • @aabkhcdcz6067
    @aabkhcdcz6067 5 років тому

    شكرا جزيلا

  • @msp9331
    @msp9331 4 роки тому

    isnt that the guy from joe rogans podcast? it takes me a week to grasp what he says in 5 minutes.

  • @thepalad1n197
    @thepalad1n197 5 років тому +3

    oh shit i listen to your podcast lmao

  • @reinerwilhelms-tricarico344
    @reinerwilhelms-tricarico344 2 роки тому

    Couldn't always follow. Was distracted by the two cats and then later by the fool who fell in the water. 🙂

  • @OldGamerNoob
    @OldGamerNoob 5 років тому +1

    My naive perception is that every frame of "video" entering into each of our eyes and every second of sensory data we receive from birth constitutes a rather large data set for our brains to train on (although having the possibility to constantly train and update the network)

    • @mutyaluamballa
      @mutyaluamballa 5 років тому

      Yes, but my perception is, the brain is already a trained model with the data from all our ancestors and at the time of birth. we will have a trained model only with all the necessary weights excluding the dataset it is trained on (our ancestors' life). which can be retrained on the go, based on our experiences. : )

    • @kawingchan
      @kawingchan 5 років тому

      I think this maybe mostly true for other mammals, the less intelligent, the more hard wired. When it comes to human, maybe not so sure how much we rely on genetic wiring, vs. neural plasticity aka training. Not sure if any ethical experiments can bring any insight.

  • @ProfessionalTycoons
    @ProfessionalTycoons 6 років тому

    very good

  • @ns4235
    @ns4235 3 роки тому

    just create a large number of random simulations. if you're successful in a large number of other realities then this one should be easy. o_o

  • @vincentschmitt392
    @vincentschmitt392 3 роки тому

    nice tie

  • @samferrer
    @samferrer 5 років тому

    I am having hard ... very hard time believing that the brain uses back propagation as learning mechanism ... it just makes no sense in a space-time governed universe ... god damn good lecture ... by the way ...

    • @MrPeregrineFalcon
      @MrPeregrineFalcon 5 років тому

      Lex doesn't say the brain uses it (he says it's a mystery). And more generally most cognitive neurologists don't believe it does - although some think there are similar biological correlates. But it's a very efficient algorithm for ANNs to perform gradient descent.

    • @petevenuti7355
      @petevenuti7355 2 роки тому +2

      As far as I know , biological brains don't use back propagation. But there are neural circuits where the flow of information goes opposite. There is also the chemical side of things integrating many levels of homeostasis from hunger to pain to emotion.
      I would say the combination of those two are the mysterious correlates of back propagation, back propagation being the obviously oversimplified version.

    • @samferrer
      @samferrer 2 роки тому

      @@petevenuti7355 got it ...

  • @pittyconor2489
    @pittyconor2489 4 роки тому

    nice

  • @abhiastronomy
    @abhiastronomy 4 роки тому

    Nice yo

  • @fizzfox8886
    @fizzfox8886 4 роки тому

    the robots won't be happy to see that we kicked them in our labs instead of being friendly :/

  • @skyfeelan
    @skyfeelan 3 роки тому

    34:12

  • @arsh2489
    @arsh2489 10 місяців тому

    2:15

  • @midishh
    @midishh 4 місяці тому

    hugest*

  • @rikelmens
    @rikelmens 5 років тому

    Lex is super low on cortisol and super high on gaba. So much so he sounds quite sleepy sometimes.

  • @guilhermeparreiras8467
    @guilhermeparreiras8467 5 років тому +3

    Could bet he is a fan of Jordan Peterson.

    • @ryanvb3452
      @ryanvb3452 5 років тому

      What makes you think so?

  • @spinLOL533
    @spinLOL533 6 років тому

    Insert comment

  • @Mark-vv8by
    @Mark-vv8by 5 років тому

    the viewers are a lot less from the first clip