Open-Ended Learning Leads to Generally Capable Agents | Results Showreel

Поділитися
Вставка
  • Опубліковано 13 тра 2024
  • This is the accompanying video for the paper "Open-Ended Learning Leads to Generally Capable Agents" (DeepMind 2021). All results shown are with the same agent and on hand-authored probe tasks that were held out of training.
    Further reading blog: deepmind.com/blog/article/gen...
    Paper: deepmind.com/research/publica...
    Timestamps of Results:
    00:00 Intro
    00:24 Tag Fiesta
    01:23 King of the Hill
    02:18 Hide and Seek
    04:22 Capture the Flag
    06:57 Catch 'em All
    07:53 Choose Wisely
    08:47 Nowhere to Hide
    09:33 Stop Roll
    10:43 Near not Near
    11:53 See not See
    12:42 Build Ramp
    14:26 Outro
  • Наука та технологія

КОМЕНТАРІ • 180

  • @Xavier-es4gi
    @Xavier-es4gi 2 роки тому +14

    - Working on that must be so much fun
    - The behavior of zapping each other in the cooperative game feels so human like

  • @user-hh2is9kg9j
    @user-hh2is9kg9j 2 роки тому +181

    A narrator would increase the quality of these videos 10 fold.

    • @fernandolener1106
      @fernandolener1106 2 роки тому +7

      They could use AI to speak...

    • @PeterJnicol
      @PeterJnicol 2 роки тому +8

      It's bizarre. White text on white background and no narration.

    • @JohnnyJAndersen
      @JohnnyJAndersen 2 роки тому +1

      ua-cam.com/video/uuzow7TEQ1s/v-deo.html Is almost that :)

    • @epicmatter3512
      @epicmatter3512 2 роки тому +1

      I encourage you to watch two-minute papers that covered this topic with narration.

  • @parsarahimi71
    @parsarahimi71 2 роки тому +60

    So this was what deepmind was doing while I was stuck at my paper for year and a half ...

  • @TheJysN
    @TheJysN 2 роки тому +89

    This is amazing. Almost like children who play a game for the first time.
    How a great person, who's name nobody can pronounce, often Says. Imagine what this looks like 2 Papers down the line.

    • @surreal_dreams
      @surreal_dreams 2 роки тому +12

      "Dear fellow scholars, this is 2 minute papers with Dr. Karoly Zsolnai-Feher"
      Me: Karol Jonaï Fahir ?

    • @Zen_Power
      @Zen_Power 2 роки тому +8

      Squeeze your papers!!!!

    • @josephs2137
      @josephs2137 2 роки тому

      @@Zen_Power and if you're still holding them, -squeeze- your papers again because here's what our NEW method looks like!

  • @ZergYinYang
    @ZergYinYang 2 роки тому +92

    Would love some human narrative, or music, or sound effects or something.

    • @authentic229.14
      @authentic229.14 2 роки тому

      Would loveif they would stop improving this thing that can wipe out humanity.

    • @hherpdderp
      @hherpdderp 2 роки тому

      Yakety Sax?,

  • @casperguo7177
    @casperguo7177 2 роки тому +64

    Well, now we know that our mechanic overloads would dearly love spawn-killing us if it comes to that.
    On a more serious note, it would be quite interesting to see agents demonstrating some cooperative behaviors in other game scenarios.

    • @domdj9476
      @domdj9476 2 роки тому +6

      'Catch em all' and 'choose wisely' are both collaborative

    • @LetsDark
      @LetsDark 2 роки тому +11

      @@domdj9476 The goal was, but the agents haven't learnd to behave cooperativly. Thy killed each other constantly in 'Catch em all'.

    • @mimszanadunstedt441
      @mimszanadunstedt441 2 роки тому +3

      They should do a resource trading simulation where they get rewarded for every object on their side and every unique object, but they only have access to certain stuff each, can drop items on an escalator or something to give to the other. Then we can watch trade learning happen. It'd be neat if they learn to withhold items until they see the other approach with a held item they need then they both drop the items down at the same time to trade. But some might feed items constantly and get taken advantage of etc, so there should be a limited amount of items. +2 reward each unique +1 reward each subsequent on your territory. Can make a variation where they can also steal items from each other or choose not to because fighting gives them both less in the end. Thats how you teach cooperation.

    • @dannygjk
      @dannygjk 2 роки тому

      Cooperative behaviors has been done.

  • @idanmuze8905
    @idanmuze8905 2 роки тому +7

    It's crazy how generally effective the training was. Many of the games were completely different!

  • @frederik3326
    @frederik3326 2 роки тому +2

    Love how they tagged each other even when they could work together. It's crazy that they are actually AIs

  • @Eric_Cartman001
    @Eric_Cartman001 2 роки тому +42

    The time has come. Reward really is enough!

    • @xsuploader
      @xsuploader 2 роки тому +5

      how much reward though? Is scaling these agents with 1000x more compute enough for AGI or are we like One billionth of the way there

    • @Gallowglass7
      @Gallowglass7 2 роки тому +4

      @@xsuploader Idk but I imagine quatom computing will be a game changer. This is gonna happen a lot quicker than we thought.

    • @relicwarchief640
      @relicwarchief640 2 роки тому +4

      @@Gallowglass7 I know close to nothing about quantum computer. Why do you say quantum computing will be game changing? Just curious so I can go google a few keyboards

    • @Eric_Cartman001
      @Eric_Cartman001 2 роки тому +4

      @@relicwarchief640 Keywords : Quantum Supremacy, Quantum threshold theorem, Quantum error correction, Fault tolerant quantum computation.
      Not NISQ(Noisy Intermediate-Scale Quantum Computer) but Universal Quantum computer will be truly revolutionary. if the best supercomputer we could have is a rocket engine, Quantum Computer is like Warp drive or wormhole.
      If scientists find the way to reduce quantum error below the threshold, by building Fault tolerant system with millions of physical qubits, we will enter the Age of universal quantum computer.
      Google and IBM are trying to build 1 million qubits quantum computer by 2030

    • @relicwarchief640
      @relicwarchief640 2 роки тому +3

      @@Eric_Cartman001 ty for the keywords and explanation. I'll do some reading on that stuff

  • @anishupadhayay3917
    @anishupadhayay3917 8 місяців тому

    Brilliant

  • @bayesianlee6447
    @bayesianlee6447 2 роки тому +1

    congrats. reward seems true treasure we all have been looking for.

  • @neddreadmaynard
    @neddreadmaynard 2 роки тому +36

    Awesome demonstration. But some audio would be nice. Thought my TV was broken.

    • @petyrbaelish1216
      @petyrbaelish1216 2 роки тому +4

      We need a narrator.

    • @LouisChiaki
      @LouisChiaki 2 роки тому +1

      @@petyrbaelish1216 I am sure DeepMind got an AI model for that ;)

    • @petyrbaelish1216
      @petyrbaelish1216 2 роки тому

      @@LouisChiaki sure wish they would use it.

  • @johnnieblair2325
    @johnnieblair2325 2 роки тому +16

    Conclusion from the blog article -
    "By developing an environment like XLand and new training algorithms that support the open-ended creation of complexity, we’ve seen clear signs of zero-shot generalization from RL agents. Whilst these agents are starting to be generally capable within this task space, we look forward to continuing our research and development to further improve their performance and create ever more adaptive agents."
    When will someone utilize this zero-shot generalization approach in dialogue agents? Seems like this should be a priority. Such that Deepmind programmers can then verbally interact with the agents they're trying to improve.

    • @RoobertFlynn
      @RoobertFlynn 2 роки тому +4

      Hard to implement this kind of self-play reinforcement type learning in NLP. If you have proper understanding of language you basically have AGI

  • @tristanwegner
    @tristanwegner 2 роки тому +23

    Interesting is the rotation while translating when not focused on an object, to constantly scan 360degrees. I wonder if the walking speed was the same in all directions, and which tradeoff the agents would choose, if forward movement was faster than the other directions.

    • @sSsero
      @sSsero 2 роки тому +4

      It could also be that the agent defaults into the same action if there is no difference, since usually there is no incentive for agents to stay still.

    • @paulw491
      @paulw491 2 роки тому

      @@sSsero problem with the agent is if there is no difference then the agent thinks he makes no difference and ends up inventing a difference to make himself different.

  • @ToloLP
    @ToloLP 2 роки тому +33

    some backround music would've been nice

  • @mrpisarik
    @mrpisarik 2 роки тому +30

    Is there a dedicated team at DeepMind that produces all those videos, graphics and animations in their papers? Or is it done by researchers as well? I want to do the same myself, but it takes me eternity to produce one moderate looking picture. Any advice how to produce scientific videos and pictures? I have CS degree, and know C++ (4 years) and Python (5 years) quite well. What is the best scientific-graphics production process/pipeline for one person? (preferably within Linux environment)

    • @sangnguyen-sv2lj
      @sangnguyen-sv2lj 2 роки тому +3

      You might want to try out Reddit or the likes if you couldn't get an answer here. Good luck.

    • @shadowkiller0071
      @shadowkiller0071 2 роки тому +5

      I'm like 99% sure they have a team doing illustrations and editing. The common softwares for these things would prob be your best bet, illustrator, after effects, premiere etc.

    • @TheCalcaholic
      @TheCalcaholic 2 роки тому +8

      That video in particular looks like they built or integrated their training environment into a game engine.
      Otherwise, when it comes to mere video editing, you can use any sufficiently sophisticated video editor (e. g. kdenlive on Linux).
      If you really need complex visualizations, that's not easy to learn - but anyways, have a look at this talk: ua-cam.com/video/686nsgx8Yd4/v-deo.html

    • @mrpisarik
      @mrpisarik 2 роки тому +1

      ​@@TheCalcaholic I have looked for this topic "blender for scientists", but somehow overlooked that video. Thanks a lot!

    • @hb-youtube
      @hb-youtube 2 роки тому +1

      Good luck. In long term, I hope you&other techies work on an agent by 2030 say, that one can speak with, and describe what one wants to be created, what format one's scientific data is in(or multiple formats, specified)it can create thumbnails and then converse with you to make sure it understands what you want; it can say "I can create a Blender based image that meets your requirements A, B, C in ___minutes, then show you and ask you how it looks, should I?" If creates tools on the spot (like interactive sliders you shift when just your voice responses aren't enough); it even finds images on the web, "Here, I found images X, Y, Z, to make sure I understand what you mean by the type of texture you verbally described. Which, if any, of them has the texture matching (or close enough to what you need)?If none, please give me more verbal input" Sound ambitious? I would urge everyone aim for this and even more, by 2030..
      If we don't aim high,we'll maybe not make as much progress. If we aim for the moon, maybe we'll get at least to Earth-orbit, to mix two common metaphors.If we still don't have the above and more by 2035 or 2040,I'll be disappointed.
      My background is in STEM but I'm not a software engineer/programmer..Consider these suggestions a tiny contribution. I remember in the early 1980s thinking to myself: if only games could be taught to know basic laws of physics and derive things from that(gravity, flammability, wetness,etc) instead of having to be told what happens in so many possible cases.. Fast forward to today, well,I'm not a gamer at all, but I do know that on some level,at least partially,we have such today. Aim high and encourage others to do so too :-) Meanwhile, I do with you good luck with scientific videos with today's software.

  • @domdj9476
    @domdj9476 2 роки тому +1

    I want to know why / where / when they learned the spinning in capture the flag. Perhaps throwing but I'm not convinced.

    • @NoHandleToSpeakOf
      @NoHandleToSpeakOf 2 роки тому

      perhaps they just hope the flag will get on the floor with reward quicker. Or they are trying to confuse opponent. I would be.

    • @AndCaffeine
      @AndCaffeine 2 роки тому +2

      It's likely to search for the opponent. The agents can only see what's in front of them.

  • @Nullpersona
    @Nullpersona 2 роки тому +2

    Maybe the algorithm is very strictly focused on action reactions, without the perceptual bandwidth to cooperate, or recognize opportunities for external influence.
    Anthropomorphically, there seems to be an almost frantic tone to the behaviors; like there is a timer running out we cannot see, but the sprites treat as deletion.

    • @drdca8263
      @drdca8263 2 роки тому +2

      I imagine that there’s either a time preference thing, or something where the sooner they achieve the goal, the longer the amount of time that they are getting reward, and so the more total reward they get, so they are trained to achieve the goal quickly.
      Also, in a competitive/conflict game, being slow can leave more opportunities for the other player(s)...
      Though, maybe part of it is that they don’t, like, gain much precision by doing things slowly, and don’t have an energy use penalty.
      I haven’t read the paper/preprint though, so I’m just speculating.

    • @Nullpersona
      @Nullpersona 2 роки тому

      @@drdca8263 I hope a designer develops a healthier learning environment mechanic, like "interest" where sprites can observe success from different angles, and plan.
      A method of I/O for inter-sprite communication, and sufficient processing capability to facilitate consideration and complex interaction without impeding action.

  • @charleshoots4720
    @charleshoots4720 2 роки тому +3

    What a time to be alive!

  • @MAJ0RTOM
    @MAJ0RTOM 2 роки тому +1

    2:10 Its over Anakin, I have the high ground.

  • @jamesgrist1101
    @jamesgrist1101 2 роки тому +1

    Hopefully we'll see some application of this GAI in games we can relate to. The games in this vid are so dry, it is not obvious to viewers that this agent is more intelligent, general or applicable than DeepMind's best GAI from 3 years ago.

  • @Revel4tions
    @Revel4tions 2 роки тому

    Do the individual agents learn from reward seperately?

  • @thearchitect5405
    @thearchitect5405 2 роки тому

    I wonder if they could be taught to understand objectives and progression that can be applied to story games. Like if there were a basic setup, 1 mission, and it's completely randomized each time with random areas, objectives, game mechanics, etc. to eventually be able to coherently play through a story game from start to finish without previous knowledge of the particular game. Like Halo, or an open world story game being a higher end example.

  • @eckroattheeckroat4246
    @eckroattheeckroat4246 2 роки тому

    this is amazing, but can we throw some stock music on top of the next one?

  • @MaxLohMusic
    @MaxLohMusic 2 роки тому +2

    Maybe soon it will be able to play counter-strike (while handicapped by human-level coordination and reaction time)

    • @dannygjk
      @dannygjk 2 роки тому +1

      Deep Mind made a Starcraft2 AI which plays at GM level.

    • @MaxLohMusic
      @MaxLohMusic 2 роки тому

      @@dannygjk I know about that. And I was asking even when it came out, when it would play CS because I happen to understand the strategy of CS more than StarCraft. But... What I am really waiting for, is for it to play an open world game like Skyrim and actually understand what it's doing

    • @dannygjk
      @dannygjk 2 роки тому

      @@MaxLohMusic Skyrim is MMORPG?

    • @MaxLohMusic
      @MaxLohMusic 2 роки тому

      @@dannygjk No, open-world game. (no multiplayer. unless you are talking about elder scrolls online)

  • @bayesianlee6447
    @bayesianlee6447 2 роки тому

    I think descent of this new agent will soon pass the game turing test if it exists.

  • @vincenttv6325
    @vincenttv6325 2 роки тому

    alan turing and his team broke enigma..
    deep mind has done it with the discovery of the protein...

  • @buzhichun
    @buzhichun 2 роки тому

    "Through analysis and hand-authored probe tasks we characterise the behaviour of our agent, and find interesting emergent heuristic behaviours such as trial-and-error experimentation, simple tool use, option switching, and co-operation"
    Is this analysis based on more than a viewer's natural human tendencies towards antropomorphism and selective perception (i.e. choosing on an arbitrary basis which agent behaviors carry significance and which are negligible)?
    Why is the red agent carrying his own cube to the blue agent's reward zone in CTF?
    Why do the agents keep tagging/killing each other in Catch 'em All?

    • @SpiritFryer
      @SpiritFryer 2 роки тому +1

      There was a red ramp (floor) leading up to blue's floor in the CTF -- maybe the red agent saw it to be potentially faster to earn rewards by using that red ramp. And you can see at 5:18 it was hesitant to leave the area, most likely because of that red ramp

    • @luismisanmartin98
      @luismisanmartin98 2 роки тому

      @@SpiritFryer That's right - I don't understand why they put that red ramp there

  • @joshuacogliati6085
    @joshuacogliati6085 2 роки тому

    Hm, the paper says that using the tagging gadget on an object makes the object disappear for 3 seconds. Are there any examples in this video of that happening?

    • @joshuacogliati6085
      @joshuacogliati6085 2 роки тому

      For what it is worth, I was looking to see if the AI every tries experimentally to figure out if it has the tagging or the freezing gadget since Table 4 in the paper does not list a way for the AI to find out directly.
      Since the tagging gadget removes the object from the map for 3 seconds, one simple way to find out would be for the AI to use the gadget on an object at the start while looking at it and see if it disappears, but I didn't see that happening in the video.

  • @dylancope
    @dylancope 2 роки тому

    Stop roll is very Sisphusian!

  • @luismisanmartin98
    @luismisanmartin98 2 роки тому

    In situations like 13:15 it would be cool if the agents had a "collective reward" to see if the agent that goes on top would grab the pyramid and bring it down to the other agent so that they could collaborate to achieve maximum reward together

    • @masterchief5603
      @masterchief5603 2 роки тому

      I suppose it would just become more biased with it. Meaning we induce AI to work with collaboration to get Max rewards.

    • @luismisanmartin98
      @luismisanmartin98 2 роки тому

      @@masterchief5603 what do you mean?

  • @relicwarchief640
    @relicwarchief640 2 роки тому

    For "King of the Hill", once the agent found the white platform, it kept going back to that spot for reward. How does it know there isn't any other white floor for reward? It hadn't fully scanned the entire area. I'm new to the field

    • @drdca8263
      @drdca8263 2 роки тому +3

      Maybe the phrasing of the text description it was given of the task, specified “the” and not “a”, and that in the other games it was trained on, that this “a”/“the” distinction was also present? Idk, just a guess.

    • @mimszanadunstedt441
      @mimszanadunstedt441 2 роки тому +1

      Yeah one task to complete is slightly convergent. Should create a larger world perhaps with more agents. But idk how much computation the agents require here.

    • @relicwarchief640
      @relicwarchief640 2 роки тому +1

      @@drdca8263 Oh I see your point. Maybe we didn't observe the agent for too long either? My "knowledge" comes from David Silver's course on RL and I remember there being an RNG portion to the formulas to force the agent to explore given an environment. Even if there was only A task, there could've been a more efficient way to solve the task. But the video is short too. We're just speculating at this point. It is fun though!

    • @relicwarchief640
      @relicwarchief640 2 роки тому

      @@mimszanadunstedt441 Ty for your reply. I understood the second part of your reply. What do you mean by "slightly convergent"? Does that mean that the agent will "converge" onto the first solution it finds and never tries to explore anything else? Sorry, my ML lingo is a bit weak.

    • @mimszanadunstedt441
      @mimszanadunstedt441 2 роки тому

      @@relicwarchief640 I am using it to explain behavior, convergent means they arrive to a singular conclusion, divergent means they invent novel solutions, or multiple solutions. If you test one ground and not multiple, theres less solutions possible for the task thus its more convergent.

  • @rylanschaeffer3248
    @rylanschaeffer3248 2 роки тому +1

    How is this different from what UberAI/OpenAI did years ago?

    • @wlorenz65
      @wlorenz65 2 роки тому +5

      These agents get the game rules as text and understand them. OpenAI's hide-and-seek rules were hardcoded and did not change. OpenAI's hide-and-seek environments were procedurally generated as well.

    • @xsuploader
      @xsuploader 2 роки тому +2

      open ais agents just played hide and seek
      these same agents play in 4000 different game worlds. Much more general.

    • @wenxue8155
      @wenxue8155 2 роки тому +2

      OpenAI trained agents to play specific games. Deepmind trained agents to play some games and send the agents to play hold-out games, games the agent never played before. The games in the video are the hold-out games. Read Ken Stanley's book. This is one step closed to Human level AI. Could be the only practical way, which is imitating evolution to create Human level AI.

    • @bayesianlee6447
      @bayesianlee6447 2 роки тому

      generality vs. specific mode
      its like machine which can do human doing vs. machine which makes coffee only.
      I mean the scalability

  • @junkaccount7449
    @junkaccount7449 2 роки тому +8

    I want someone to look at me the way the blue agent looks at that purple pyramid

  • @midgetsow
    @midgetsow 2 роки тому +1

    So the AIs agree... Spin to win!

  • @abhishekmangaraj
    @abhishekmangaraj 2 роки тому

    Hey Deepmind, please do something to eliminate corruption from the face of earth

  • @bryanwang7283
    @bryanwang7283 2 роки тому

    I wonder it is open-source or not

  • @chrissmith4444
    @chrissmith4444 2 роки тому

    Blue agent is a beast

  • @RandomMe93
    @RandomMe93 2 роки тому

    Am i the only one that does not hear any sounds?

  • @Adhil_parammel
    @Adhil_parammel 2 роки тому

    Whts difference between this and muzero.

    • @drdca8263
      @drdca8263 2 роки тому +5

      Aiui, muzero is a single architecture/design which, for many board games, if trained on that game , will reach sota performance on that game, without people needing to do anything specific to make it work for that game,
      but muzero is trained on the different games separately, so for each game, you initialize the network weights, train it, and then you get a network which is very good at that game.
      This on the other hand, they train the same network on a large variety of games, such that the same agent is good at many games.
      What follows is a kind of dumb metaphor:
      imagine you have a cloning machine that makes babies. Muzero lets you, for each game, generate a baby and raise it into a person who is good at that game. They all have the same genes/architecture, but they have learned different things.
      This however, let’s you generate a single baby, and raise it while teaching it many different games,
      and furthermore, do this in a way such that it can figure out a new game quickly.
      Uh, putting aside the metaphor with people because the next distinction doesn’t really work very well with people:
      The games which it is shown playing here, aiui/iirc, are not ones it has been trained on. Rather, because of the vast variety of games it has trained on (which each had textual descriptions), it has *learned how to figure out* what to do in a game, based on a combination of the description of the goal, as well as trying things out.
      Hope that answers your question.

    • @Adhil_parammel
      @Adhil_parammel 2 роки тому +1

      @@drdca8263 yes,👌

    • @Adhil_parammel
      @Adhil_parammel 2 роки тому

      @@drdca8263 is there any way to connect gpt3 and this muzero. so that we get textual teaching of chess and go etc.

  • @ProlificSwan
    @ProlificSwan 2 роки тому +1

    How is this different from OpenAI's approach with the hide and seek agents? The results seem to be similar or possibly worse in some cases.

    • @abcqer555
      @abcqer555 2 роки тому

      Agreed

    • @samlouiscohen
      @samlouiscohen 2 роки тому +6

      OpenAI’s paper focused exclusively on the task of hide and seek. This paper is much more impressive (IMO) as it demonstrates agents that are able to perform “well” across a much more diverse set of tasks while exhibiting zero-shot behavior to succeed at unseen tasks like capture the flag and hide and seek.

    • @ProlificSwan
      @ProlificSwan 2 роки тому +1

      @@samlouiscohen that's a good point. For zero shot behavior, this is quite impressive. I think I would be more impressed if they took these results and extended them to show mastery with few shot learning.

  • @whoowhaaat5666
    @whoowhaaat5666 2 роки тому

    No sound atleast make the caption bigger and easier to read

  • @EvilStalkerBE
    @EvilStalkerBE 2 роки тому

    No sound makes it extremely difficult to watch...

  • @DerAnonymeMax
    @DerAnonymeMax 2 роки тому +8

    They are kinda cute 😂

    • @LouisChiaki
      @LouisChiaki 2 роки тому +1

      Except when one eliminate the other competitor without any effort.

  • @stigcc
    @stigcc 2 роки тому +1

    I would love to see them play football

    • @wlorenz65
      @wlorenz65 2 роки тому

      You would have to disable their hands so that they cannot lift the ball. Otherwise it should be possible if the goals are defined as colored floors.

    • @masterchief5603
      @masterchief5603 2 роки тому

      @@wlorenz65 you can deduct reward for lifting a ball to the extent it would not be worth by agent to lift the ball. Zap function should be also done like that.

  • @Daniel-ih4zh
    @Daniel-ih4zh 2 роки тому

    Bruh, what da robot doin 💀💀

  • @Anujkumar-my1wi
    @Anujkumar-my1wi 2 роки тому

    Can you tell who first introduced the concept of 'state' ?

    • @drdca8263
      @drdca8263 2 роки тому

      Like, the concept of “the state of a system”? I imagine it is fairly old idea. Slightly less old might be the concept of a state space.
      I might be misinterpreting you.

    • @Anujkumar-my1wi
      @Anujkumar-my1wi 2 роки тому

      yes ,i am talking about "the state of system", can you tell what is the definition of "state" of a system ,as its definition varies from one field to other?

    • @drdca8263
      @drdca8263 2 роки тому +1

      @@Anujkumar-my1wi the (current) state of a system is the way that it (currently) is. (While a previous state of a system is the way it was then).
      Often we will model a system by describing its state with a set of numeric parameters (e.g. “the position and velocity of each particle”)
      Generally the state refers to the parts that could potentially change, not including things that are just inherently part of how the system always is.

    • @Anujkumar-my1wi
      @Anujkumar-my1wi 2 роки тому

      @@drdca8263 In reinforcement learning ,some paper refer to state as 'summary of history of actions, observation etc ',why is that ?

    • @drdca8263
      @drdca8263 2 роки тому

      @@Anujkumar-my1wi well, if these are the things that the agent remembers and can act based on, then that, having those things in its memory, is the way that it is.
      Now that you mention that, I think I have a way to rephrase the description I gave before that might be more clear.
      A thing which has a state, and is changing over time, either it has an environment that it interacts with, or it doesn’t,
      and the thing that determines how the thing behaves at each time, is a combination of its current state, possibly some input from the environment (if applicable) and maybe some randomness can also matter,
      and the “how it behaves” at each time includes how its state changes.

  • @alicanakca8116
    @alicanakca8116 2 роки тому +1

    I hope final bosses in games don't use these algorithms.

  • @wisgarus
    @wisgarus 2 роки тому

    Can separate agents learn to work together with little or no directions?

    • @NoHandleToSpeakOf
      @NoHandleToSpeakOf 2 роки тому

      that is what they did

    • @wisgarus
      @wisgarus 2 роки тому

      @@NoHandleToSpeakOf I meant like, cooperation and helping each other

  • @psylink
    @psylink 2 роки тому

    Well, I guess the only men left on earth may well be engineers and janitors...

  • @slbumkim2925
    @slbumkim2925 2 роки тому

    Being has a tendency to 'return' to clusters
    ='the nature of solidarity' -a desire for empathy -(Wave)-(yin)
    and also,
    Being has a tendency to 'exist' as individuals
    ='the nature of self-expension' -a desire for breed-(Particle)-(Yang)
    Likewise, humans have two elements.
    We must realize that we all have both left and right elements
    =Solidarity and Self reliance
    No one has only one element.
    so 'Sum' derived from 'two poles' , (thesis, antithesis, synthesis)
    To develop intellect and ethics by harmonizing the two,
    It is good to realize it and balance it properly
    But A few people polarized the crowd(political partisanship)
    without balancing themselves.
    And They stole only the sum, only the synthesis from the triangle composition.
    Now We all have to get out of this deceptive situation.
    This is not the time for us to hate each other.
    We have to track down those who have been manipulating us.

    • @israelRaizer
      @israelRaizer 2 роки тому

      Do you happen to know what is the name of this "tendency to return to clusters" that humans have?

  • @UnfilteredTrue
    @UnfilteredTrue 2 роки тому

    This can be used to develop better AI in the games.

  • @kiachi470
    @kiachi470 2 роки тому +1

    Reward is Enough.

  • @SudarsanVirtualPro
    @SudarsanVirtualPro 5 місяців тому

    It must be done in computer time not human time

  • @soderberg8932
    @soderberg8932 2 роки тому

    the green one is beating their ass

  • @TheBigLou13
    @TheBigLou13 2 роки тому +9

    NEVER combine open ended learning with direct actions in the real world. As soon as it learns that it can learn more by stopping people from stopping it, we're doomed by design.

    • @TheBigLou13
      @TheBigLou13 2 роки тому

      Self awareness might play a role here.
      (knowing that it knows stuff and the ability to compare itself to others)

    • @herzkine
      @herzkine 2 роки тому +1

      @@TheBigLou13 actually you pointed out good no self awareness or evil needed to get zs fecked very good in your Initial comment

  • @LouisChiaki
    @LouisChiaki 2 роки тому +2

    Or, a random number generator leads to generally capable agents?
    Some of the experiments look like can be achieve by just random. We need to compare to a random number generator as the baseline.

  • @guilesmart7486
    @guilesmart7486 2 роки тому

    Obvious conclusion: Blue is better then red.

  • @NextFuckingLevel
    @NextFuckingLevel 2 роки тому +1

    OpenAI POET team : 🗿🗿🗿

  • @chlorine8477
    @chlorine8477 2 роки тому

    Very interesting, but also worrisome. I can already imagine tireless, ruthless agents playing "capture the dissident"

  • @Mynestrone
    @Mynestrone 2 роки тому

    I hate that you put them in competition with each other and let them inflict violence to achieve their goals

  • @Jersey1225
    @Jersey1225 2 роки тому

    For playing games , AI is better than me .

    • @444haluk
      @444haluk 2 роки тому

      Thankfully humans are not game players but representation builders for real world problems. A game is basically a planning exercise for fun.

  • @MGPL_
    @MGPL_ 2 роки тому

    Now you need a AI to help you with narration and avoiding white captions on light backgrounds

  • @FallOfTheLiving
    @FallOfTheLiving 2 роки тому +1

    No voice over or music

  • @prathameshjadhav780
    @prathameshjadhav780 2 роки тому

    wtf is this i wanna hear something

  • @bradscott8042
    @bradscott8042 2 роки тому +1

    The brief idea morphometrically fire because litter externally receive before a ruthless statement. able, poor control

  • @shotatoriumi2459
    @shotatoriumi2459 2 роки тому

    The spotless cirrus inspiringly jog because tuba prospectively permit until a wooden maria. same, lonely meeting

  • @thegreatdream8427
    @thegreatdream8427 2 роки тому

    I don't understand why anyone is studying this. It is far too dangerous. Humanity is not ready for AGI, and just because it's possible doesn't mean it's desirable. Give it a rest, please! Focus on building narrow AIs for specific tasks until the alignment problem is solved!

    • @luckys541
      @luckys541 2 роки тому +2

      The problem is no one knows when the first AGI will appear. There are a lot of organizations in the world that is trying to solve AGI and I think given the acceleration of research on this field, AGI may arrive at any time. If the first built AGI is not safe, this will be catastrophic. So those who are extremely cautious on AI safety should bring the first AGI before others who are not.
      I think Deepmind's this research based on reinforcement learning is more likely to reach safe AGI than other AGI attempts. If you can create an AGI based on the same principles of human evolution, I think there is less chance of AGI turning bad for the world. But still, there should be a great care.

    • @thegreatdream8427
      @thegreatdream8427 2 роки тому +1

      @@luckys541 I think we ought to focus on augmenting human intelligence first, so that we'll always be able, at least collectively, to outsmart whatever AGIs are being produced, at least until the alignment problem is fully solved.

  • @444haluk
    @444haluk 2 роки тому

    They play really bad, though. And we all know these kind of graphics, physics and the "game mechanics" are useless.

    • @drdca8263
      @drdca8263 2 роки тому +8

      But take into account that they weren’t trained on these particular games (unless noted otherwise). Aiui, they are essentially seeing these games for the first time.
      You don’t expect an expert chess player to be an expert go player the first time they play go.
      The point is the generalization to new tasks (based on a text description + “are you winning right now”), not the skill at the particular tasks.

    • @444haluk
      @444haluk 2 роки тому

      @@drdca8263 If you think gradient descent can generalize to the out of the distribution scenarios, well, just read the paper and see it, buddy. I can tell you specific lines if you want.

    • @KivySchool
      @KivySchool 2 роки тому +1

      @@444haluk which lines

    • @444haluk
      @444haluk 2 роки тому

      @@KivySchool I am writing the answer but it gets deleted. Page 3, the unseen worlds are convex combination of the old worlds in a generator.

    • @444haluk
      @444haluk 2 роки тому

      @Jože Ws I am writing the answer but it gets deleted. Page 3, the unseen worlds are convex combination of the old worlds in a generator.

  • @nerdomania24
    @nerdomania24 2 роки тому +2

    this is so poor, it hurts.