Reward Is Enough (Machine Learning Research Paper Explained)

Поділитися
Вставка
  • Опубліковано 30 тра 2024
  • #reinforcementlearning #deepmind #agi
    What's the most promising path to creating Artificial General Intelligence (AGI)? This paper makes the bold claim that a learning agent maximizing its reward in a sufficiently complex environment will necessarily develop intelligence as a by-product, and that Reward Maximization is the best way to move the creation of AGI forward. The paper is a mix of philosophy, engineering, and futurism, and raises many points of discussion.
    OUTLINE:
    0:00 - Intro & Outline
    4:10 - Reward Maximization
    10:10 - The Reward-is-Enough Hypothesis
    13:15 - Abilities associated with intelligence
    16:40 - My Criticism
    26:15 - Reward Maximization through Reinforcement Learning
    31:30 - Discussion, Conclusion & My Comments
    Paper: www.sciencedirect.com/science...
    Abstract:
    In this article we hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence.
    Authors: David Silver, Satinder Singh, Doina Precup, Richard S. Sutton
    Links:
    TabNine Code Completion (Referral): bit.ly/tabnine-yannick
    UA-cam: / yannickilcher
    Twitter: / ykilcher
    Discord: / discord
    BitChute: www.bitchute.com/channel/yann...
    Minds: www.minds.com/ykilcher
    Parler: parler.com/profile/YannicKilcher
    LinkedIn: / yannic-kilcher-488534136
    BiliBili: space.bilibili.com/1824646584
    If you want to support me, the best thing to do is to share out the content :)
    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
    SubscribeStar: www.subscribestar.com/yannick...
    Patreon: / yannickilcher
    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
  • Наука та технологія

КОМЕНТАРІ • 356

  • @YannicKilcher
    @YannicKilcher  3 роки тому +9

    OUTLINE:
    0:00 - Intro & Outline
    4:10 - Reward Maximization
    10:10 - The Reward-is-Enough Hypothesis
    13:15 - Abilities associated with intelligence
    16:40 - My Criticism
    26:15 - Reward Maximization through Reinforcement Learning
    31:30 - Discussion, Conclusion & My Comments

    • @mykalkelley8315
      @mykalkelley8315 3 роки тому +1

      An agi solely focused on its own rewards sounds like how a psychopath thinks, only seeking their own reward at the expense of others, there should also be a counterbalance of some kind

    • @Eric_Cartman001
      @Eric_Cartman001 3 роки тому

      @@mykalkelley8315 Our intelligence started from monad. AI systems will learn about cooperation much quickly than biological evolution.

  • @altvali1
    @altvali1 3 роки тому +32

    Infinite time to brute-force a solution is all you need.

  • @lucyzhang7385
    @lucyzhang7385 3 роки тому +11

    the paper called "Where does value come from" by Juechems and Summerfield (2019) has a really good discussion on the reward paradox which i think also speaks to the limitation of this hypothesis

  • @dandan-gf4jk
    @dandan-gf4jk 3 роки тому +62

    Next paper: Splitting is all you need (aka revert to bacteria)

    • @Supreme_Lobster
      @Supreme_Lobster 3 роки тому +14

      Return to monke v2

    • @andres_pq
      @andres_pq 3 роки тому +4

      @@Supreme_Lobster return to eukaryote

    • @segelmark
      @segelmark 3 роки тому +3

      @@andres_pq return to mitochondria

    • @freemind.d2714
      @freemind.d2714 3 роки тому +2

      How about they just say: What we already know is all we need! but it is not! I'm agree with: greatness cannot be planned!

  • @jacobheglund4245
    @jacobheglund4245 3 роки тому +32

    It's kinda sad to see that some of the foundations of this paper aren't well-established (like a testable definition of intelligence). Especially since the authors are prominent researchers in the area of RL, they should know better and be systematically developing the necessary foundations as part of the paper if they want to make broad claims about an "intelligence hypothesis".

    • @thorcook
      @thorcook Рік тому

      Agreed. is it even _possible_ to offer a more general, vague, and completely unusable set of platitudes? there is not a single definable or measurable component or benchmark in this entire 'thesis'. they don't even bother to offer definitions of many/most crucial components, let alone define them adequately or practically (and/or with reference to any clearly established [applicative] context or conceptual framework). what are the measurable (and relevant) specifics, boundaries, orders of degrees (discrete units, gradients, etc.), conceptual /+ practical context, etc.... for terms/concepts like "complexity", "environments", "rewards (/+ 'reward signals')", "generalization", "intelligence", "abilities", "learn", "knowledge", "perceive", and ++.... even if the hypothesis presented were plausible (which the paper fails to demonstrate, in any conceivable way; even theoretically), the crux of the problem rests [entirely] on 'HOW an AI could 'maximize reward'' (a question the authors pose (or reference) casually as if it were merely tangential or supplementary, and then offer no attempt or suggestion at an answer (or even hint at a general direction to one)). NO, reward is NOT enough; not if an AI doesn't know the first thing about HOW to maximize 'paperclips'/[or: "insert any hypothetical goal and associated [yet unspecified] reward function], or what that means, etc. Enabling it with such capacities is precisely the task for which the authors claim ' _"reward'_ is enough '. obviously, therefore; it is NOT.
      Yanic gets it right by recognizing it as not much more than a collection of 'tautologies'. One might as well have posited; 'General intelligence is required (or "enough") for general [artificial] intelligence'. So, 'thanks, but no thanks,' for a 13 page philo-sacademic and rather intellectually lazy circle jerk. these sorts of armchair explorations into AI development are getting tiresome. In fact, i might suggest a more appropriate title for this "research" paper might be:
      "Enough is Enough."
      there's tautology for you. (and perhaps a significantly more useful one, sadly). let's do better.

  • @jomoho3919
    @jomoho3919 3 роки тому +82

    Plot twist: ML paper claims evolution is intelligent design.

    • @samanthaqiu3416
      @samanthaqiu3416 3 роки тому +5

      best comment for the last 3 months

    • @pensiveintrovert4318
      @pensiveintrovert4318 3 роки тому

      The stupid design agents all died out. It is all in the definition of intelligence. Is having behaviors that allow an organism to survive intelligent? The DNA stores a lot of information that has been learned through RL.

    • @akramsystems
      @akramsystems 3 роки тому +4

      Oh my god!

  • @soccerplayer922
    @soccerplayer922 3 роки тому +65

    This entire paper seems like a major hand waving exercise.

    • @nahakuma
      @nahakuma 3 роки тому +8

      @Mike Depies I think their purpose is more about moving the field and the public opinion towards the direction they consider the most promosing for AGI.

    • @technokicksyourass
      @technokicksyourass 3 роки тому +1

      @@nahakuma Best way to do that would be by writing some equations and providing a proof.

    • @soccerplayer922
      @soccerplayer922 3 роки тому +7

      @@nahakuma Worst thing about this is that it was behind a paywall.. This is more blog quality than academic paper quality. It basically waxes and wanes on absurd assumptions. They're so bent on this approach they are forcing everything into the RL abstraction instead of admitting the limitations to the formulations. Just doesn't scream insightful work in any sense.

    • @soccerplayer922
      @soccerplayer922 3 роки тому

      @@nahakuma Yeah but if that was the intent wouldn't the paper be a non-paywall paper? Idk it just read more as blog content than academic paper content. They were incredibly vague in details, and essentially restated a (flawed and ) widely held belief that we just need to run the thing longer and it works. But maybe I'm mistaken and people found value in the somewhat rehashed and unimaginative claims put forth.

    • @jeremykothe2847
      @jeremykothe2847 3 роки тому

      The whole thing is a non-sequitur. D'oh - attaining goals is what intelligence is for, train something to attain goals and it must become intelligent. Sure, but how? It's like darwinism but theoretical. We know how genes work, and it isn't like this. Survival, not 'eating the most food' is the goal. And we know it works but don't have the computational power to simulate the universe. tldr: "no shit, sherlock"

  • @andrewcutler4599
    @andrewcutler4599 3 роки тому +10

    One cool loss function for intelligence would be trying to surprise other agents in the simulation. Agents can influence their environment (eg. throw rocks, sing, build structures). They are also tasked with predicting what will happen in the next frames. They are rewarded for surprising others and correctly predicting the future. So each agent will have two networks; one to predict the future and another to suggest actions. These networks can share weights on base levels to encourage learning.
    I think this encourages intelligence more directly than most rewards (eg. collect rocks), which may collapse. Indeed IRL's reward function of replicate yourself can be fairly unstable. If foxes are too good and eat all the rabbits then the foxes can go extinct. Sometimes a locally good strategy leads to collapse of the system. In the example of rewarding agents for collecting rocks. Maybe one agent learns to kill other agents to steal their rocks. Becomes more powerful and takes all the rocks. Now what?
    Surprising other agents (and not being surprised) seems more stable. More social.

    • @YannicKilcher
      @YannicKilcher  3 роки тому +5

      Nice idea. How would you prevent agents from just performing random actions? That would be most unpredictable. But I Like the approach. Come to our discord and ping me there

    • @andrewcutler4599
      @andrewcutler4599 3 роки тому +3

      ​@@YannicKilcher Good point. I suppose one would need a proper reward as well to keep the agents honest. I forget exactly, but some team (maybe openAI?) used "surprise" as an auxilary loss when training agents to play atari (or somesuch) games. Obviously the main reward would be the score on the game, but this helped the agent not get stuck on hard levels.
      To train for intelligence, I suggest the importance would be reversed.

  • @segelmark
    @segelmark 3 роки тому +63

    Evolution is all you need. Darwin called, he wants his paper back.

    • @segelmark
      @segelmark 3 роки тому +12

      Am I missing something? The universe has already proven this? Self presevation is enough to evolve intelligent life. Maybe doesnt happen everytime the universe runs, but at least once.

    • @nahakuma
      @nahakuma 3 роки тому +4

      @@segelmark I think the problem is that people won't believe this proof and so the authors are forced to state the obvious. However, as you say, you not only need a reward, but also luck (how much luck probably depends in the specific set of physical laws), since it might happen that in a specific run of the universe intelligence does not emerge.

    • @HeavenlySkyFriday
      @HeavenlySkyFriday 3 роки тому +4

      @@nahakuma Luck is a part of state transition which reinforcement learning already accounted for. It is simply a stochastic dynamic of an environment.

    • @soccerplayer922
      @soccerplayer922 3 роки тому +1

      @@segelmark This is a misreading of evolution. We actually cannot ascribe a direction to evolution like this. We assume self preservation is the hall mark motivator, but this actually has yet to be born out in any models. There is a black spot in the evolutionary model in this way. We know that natural selection is a force, but to reduce evolution to self-preservation would be to miss the mark by too much I think.

    • @nahakuma
      @nahakuma 3 роки тому +2

      @@soccerplayer922 There is actually no force and so the spot you mention is just illusory. Evolution is just like centrifugal forces and the invisible hand, an emergent fiction that was given a name. The fact that those things that survive are those that are better adapted for survival does not need of any force.

  • @GreenManorite
    @GreenManorite 3 роки тому +33

    Someone should have stopped by the philosophy or even the economics department and collected some feedback. Smart people working outside their field can forget that people have been considering issues for decades/centuries.

  • @kirillnovik8661
    @kirillnovik8661 3 роки тому +35

    I'm surprised to see such a speculative paper made it through the peer review.
    Especially the fact that the authors didn't provide the definition of intelligence that they are building their argument on top of.
    Their hypothesis is definitely plausible but only if they defined intelligence through behavior, where it is a function that takes a state of the environment and outputs certain behavior.
    If we define intelligence through consciousness (and it seems to be the stumbling block of the arguments like this), then we are dealing with black boxes and metaphysics.
    Since when have science standards stooped so low? 🤔
    Or wait... did they indirectly define intelligence as whatever quality a reward-based system develops over time? 😄

    • @HoriaCristescu
      @HoriaCristescu 3 роки тому +3

      Intelligence is the ability to generalize learned abilities to new situations, they do insist on the environment being complex enough that it will supply enough challenge. That's the key of their thesis - goal diversity leads to intelligence by reward maximization.

    • @MrCmon113
      @MrCmon113 3 роки тому +2

      >only if they defined intelligence through behavior
      Well, it is. That's what people mean with intelligence. Outperforming certain benchmarks using certain resources.
      >If we define intelligence through consciousness
      Then you're confusing two seperate terms. There doesn't need to be any intelligence for consciousness in principle nor does intelligence require consciousness. They might be connected in the real world (there's hypothesis linking consciousness to locality of information), but they are still completely different things.

  • @prithvirajgawande6150
    @prithvirajgawande6150 3 роки тому +12

    I think for evolution "to exist more" (by living longer or replication) is the reward which itself becomes a self fulfilling property to justify it's existence.

    • @boss91ssod
      @boss91ssod 3 роки тому +1

      true, everything else arises from this

  • @UmbertoValleriani
    @UmbertoValleriani 3 роки тому +11

    Paperclip maximizers is all you need

  • @alabrrmrbmmr
    @alabrrmrbmmr 3 роки тому +1

    Thanks again for doing this. I'm learning, not only some new research, but also, how to break down a paper.

  • @tae898
    @tae898 2 роки тому +2

    I think reward played a huge role in evolution but some kind of randomness also played a huge role. The fact that bacteria and humans evolved in different ways, where the former didn't get intelligence and the latter become intelligence, despite both of them pursuing the same reward, shows me that some randomness played a role here.

  • @AhmedAshraf-qe6rh
    @AhmedAshraf-qe6rh 2 роки тому

    27:23 your two question marks!!! First paragraph in Section 4 could be a subtle allusion to Bellman's recursion :) You're doing a great job reviewing papers!

  • @gilshapira3498
    @gilshapira3498 2 роки тому

    The bacteria is a powerful counter-example to the hypothesis that reward is enough - thanks Yannic for yet another interesting paper review.

    • @bojhuang
      @bojhuang Рік тому

      A bacteria dies in its environment much easier and quicker than a human being, and had the bacteria has human-level intelligence it would have more chance to survive longer and better. Consequently, survival as a reward is enough to solicit intelligence (even) in the bacteria example.
      The argument that "bacteria *as a species* survive wider than human *as a species*" is irrelevant to the hypothesis here because intelligence is not a property for a species but for an individual organism.

  • @herp_derpingson
    @herp_derpingson 3 роки тому +15

    0:00 Oh David Silver! He courses on RL are amazing. Thats where I learnt all my RL from.
    .
    24:30 From an evolutionary perspective, there is a lot of random chance involved. There are many evolutionary branches where the environment changed randomly which forced the creatures to adapt and become stronger. When the environment became normal again, they dominated. Bacteria and Humans are equally evolved, but they are adapted to a different environment. We cannot say that the environment for bacteria and humans are the same because although they are same right now, there were times where things diverged locally.
    .
    27:10 Eh?
    .
    I think the whole paper is a really complicated way of saying, "If it does what we wanted it to do, then its intelligent enough. Good night."
    .
    Not a big fan of idea papers. If you want to share your ideas, then make a blog post or youtube videos.

    • @YannicKilcher
      @YannicKilcher  3 роки тому +9

      Yes exactly. And the "sufficiently complex" environment seems to be defined as any environment where reward maximization would elicit intelligence, and that closes the loop :D
      Also: Eh?

    • @HoriaCristescu
      @HoriaCristescu 3 роки тому

      @@YannicKilcher Maybe it's all about openendedness or diversity of subgoals the agent learns to maximize reward on.

  • @norabelrose198
    @norabelrose198 3 роки тому +1

    At first I was rather sympathetic to the thesis of this paper but your argument at around 19:00 is quite good so now I’m a lot less sure. It seems like what we think of as “intelligence” may only be a local optimum configuration for certain types of reward functions and better optima could look very “dumb” from our POV

  • @harshavardhanatg6154
    @harshavardhanatg6154 Рік тому

    Loved your explanation. We can strongly assert that evolution is not driven by reward maximization. It is mainly due to mutations, which are random. We can perhaps formulate a new and moderated question along the lines, given that an agent has prior intelligence and a set of gear to interact with the environment (say, evolution has endowed the agent with both these things), is reward enough to fine tune to environment perceived by the agent? We can use pre-trained transformers to kind of set up this problem and dissect the inner layers of the transformers to gain better understanding of the above question....

  • @zhangcx93
    @zhangcx93 3 роки тому +9

    According to "Reward is Enough", if you give a dog reward, it can keeps getting smart and surpasses all intelligent agents.

    • @manuelillanes1635
      @manuelillanes1635 2 роки тому +2

      I don't see a problem with that, if you give it enough time it will do it

  • @Isinlor
    @Isinlor 3 роки тому +2

    How is this any different from postulates in Superintelligence by Bostrom? I have no access to the paper, but seems like they just say that instrumental goals are a thing.

  • @cooldhongibaba
    @cooldhongibaba 2 роки тому

    Hands down the best intro to a paper review!!🙌😂

  • @svenwientjes8358
    @svenwientjes8358 3 роки тому +1

    At 27:13 I am pretty sure they are referring to the recent breakthroughs in meta-learning, where through very general 'model-free' RL, very complex and efficient, task-specific learning algorithms can be learned ('model-based RL'). This allows those agents to learn particular problems quicker, thanks to RL. "Learning to learn" so to say :D. Very interesting theory in neuroscience also points to this, a famous paper is called "Prefrontal cortex as a meta-reinforcement learning system" if you're interested. I agree that the language they use there is quite vague, but it refers to this specific idea. Hopefully it helps!

    • @nahakuma
      @nahakuma 3 роки тому +1

      While I agree with you that meta-learning is problably the path, I think they are referring to a slighlty different thing. There are a lot of papers that try to maximize reward indirectly by endowing agents with various inductive biases (which might be more efficient than learning to learn in a lot of environments) and I feel that what the authors are saying is that we should not worry so much about these inductive biases, but just about maximizing the rewards (and probably the best way to maximize them in a completely general way is through meta-learning). For example, it seems they are pushing back a little bit the new wave of off-line RL.

    • @svenwientjes8358
      @svenwientjes8358 3 роки тому

      @@nahakuma I agree with you on the idea that they seem to wish to do away with 'endowed' inductive biases! Maybe this is exactly what you are saying as well, the way I understand the theory of meta-learning now, is that it allows a general purpose RL algorithm to learn particular inductive biases relevant for a particular task distribution you train it on. For now it seems very dependent on how that task distribution is specified, but the general idea seems to be that the task distribution of some class such as 'survive' or 'reproduce' might allow the inductive biases that make us 'intelligent' to arise 'for free'. In that case, Reward is Enough :D...

  • @MrAms96
    @MrAms96 3 роки тому +4

    The intelligence and learning dilemma, the chicken and the egg problem, can't we just consider it a matter of initialization? If it is a recursive process, then any agent that learns will become intelligent and in doing so improve it's learning process by figuring out how to learn better. Just a thought.

  • @xiaxu6942
    @xiaxu6942 3 роки тому +5

    It's like non-essentialism. Basically we don't need to define what exactly intelligence is. Just optimize the reward and we''ll get there. And then we call what we got 'Intelligence'.

    • @HeavenlySkyFriday
      @HeavenlySkyFriday 3 роки тому +1

      @Hari Thapliyaal ​not explicitly define intelligence != not knowing the goal

    • @HoriaCristescu
      @HoriaCristescu 3 роки тому

      @Hari Thapliyaal Nature doesn't need to define intelligence, intelligence affirms itself by existing, and by virtue of it being intelligent it exists.

    • @MrCmon113
      @MrCmon113 3 роки тому

      Intelligence in AGI is beating humans at all cognitive tasks we care about.

  • @richardbrucebaxter
    @richardbrucebaxter 3 роки тому

    21:00 - Bacteria consist of multiple species. Intelligence gives an organism/species the ability to thrive in novel/arbitrary environments (including space), via memetics. The ability of a domain/phyla to biologically adapt to novel environments through natural selection is a different ability.

  • @GuillermoValleCosmos
    @GuillermoValleCosmos 3 роки тому +17

    Maybe a truer hypothesis could be: "maximizing reward in an environment which already has intelligence deeply embedded in it (e.g. human society), is sufficient to develop intelligence" ?

  • @oreganorx7
    @oreganorx7 3 роки тому +1

    Hi Yannic, I am an AI from the future, projecting this comment into the past ; ) I wonder if "Reward + specific environmental conditions is Enough" would be a more accurate hypothesis. As you pointed out, there is a niche where bacteria are the optimal solution but there is a niche where human intelligence is the optimal solution. Perhaps that is captured simply by which environment the agent wishes to thrive in. Environment in this case would have to include scale (i.e. human and bacteria may exist in the same place at the same time but the environment that each faces is drastically different due to differences in scale).

  • @DistortedV12
    @DistortedV12 2 роки тому

    Hierarchical reward of power mediated by curiosity, skill proficiency and social influence? Also, what are the priors that the agent needs to have? (Are the SOTA pre-trained models enough?)

  • @avidrucker
    @avidrucker 3 роки тому

    @Yannic Kilcher Do you have any favorite films that you believe to be quite believable/realistic in their depiction of ML/AI (past, present, or future) ?

  • @I_have_solved_AGI
    @I_have_solved_AGI Рік тому +1

    How does one keep defining reward every step of the way for AGI, or what is the ultimate reward?

  • @fbdalian
    @fbdalian 3 роки тому +1

    Maybe it depends on the reward and the population of agents. For example we should try a system with male agents trying to reproduce with female agents, and female agents trying to catch the most powerful male :D So each one (male and female) is rewarded by the other category members.

  • @galchinsky
    @galchinsky 3 роки тому +5

    Free articles is all you need

    • @galchinsky
      @galchinsky 3 роки тому

      I mean, really, no arxiv?

    • @alpers.2123
      @alpers.2123 3 роки тому

      Alexandra Elbakyan is all you need

  • @KenFehling
    @KenFehling 3 роки тому +5

    This was basically my idea of how AI could work before I got into learning about it. It would be cool if it was this simple and intuitive.

  • @NeoShameMan
    @NeoShameMan 3 роки тому

    I feel like we should probably start separating intelligence into retrospective intelligence and prospective intelligence. Ml like dnn are very good at retrospective intelligence, while symbolic manipulation are very good at prospective intelligence. But while symbolic generally hit the wall of the dictionary problem, dnn excels at it. Imho i obviously abusively reclassify dnn as fuzzy abstract compressed database, it just revealed that fancy database cover a lot of the functions needed by intelligence, but probably we need an architecture that bridge the gap from that database to symbolic, which mean latent space manipulation. Adding a output working memory to architecture like gpt2 likw on ai dungeon seems like it does good result, but so far gpt style reasoning is based on random sampling, and training is based on plausibility not knowledge, having an architecture that have some sort reasoning layer with a working memory using the probalistic latent ouput rather than the fixed token output might prove powerful.

  • @rakeshmallick8040
    @rakeshmallick8040 3 роки тому

    Yes I agree to your point that evolution is an important part and evolutionary process also is there to maximise rewards in the environment they are in. Take camels for example their adaptation to the desert environment how they are able to consume huge amounts of water and stay without drinking water for a long period of time.

  • @julian1971
    @julian1971 3 роки тому +1

    I have 2 doubts:
    1) It seems to be assumed that the reward is something which is easily measured by the agent, but it might be quite complicated to detect the reward. For example, how does our pebble collecting agent know it has successfully collected a pebble? Maybe detecting an increase in weight on a scale that is similar to what you expect from a pebble. But there may be many ways of increasing that scale by that amount that have little to do with collecting pebbles...
    2) What if the reward is exceptionally sparse? eg success is, starting from some random location in France, arriving in San Francisco is the reward. If no random exploration is ever likely to experience the reward even once (within agent lifetime) it is hard to see where the signal is to learn from.
    If either of these arguments are correct then it implies reward shaping will be a big part of reinforcement learning.
    Just my two bits...

    • @SirPlotsalot
      @SirPlotsalot Рік тому

      With some of the new frameworks around adversarial exploration and aleatoric uncertainty estimation, we can explore pretty well nowadays!

  • @kpfcgen
    @kpfcgen 3 роки тому +8

    "Reward is enough."
    - Arthur Samuel, 1959
    (probably)

  • @nahakuma
    @nahakuma 3 роки тому

    30:07 I think you just made the case for why agents should learn to learn: the more they learn, the more reward mazimization they can do, then the more intelligent they are, then the more learning they can do ... The problem is probably, as you seem to identify, how to start the cycle.

  • @saukraya3254
    @saukraya3254 3 роки тому

    Does reward implied end result is known? If it is so, it can only train where the end result is known, not enough to train AI to resolve novel problem.

  • @kirillnovik8661
    @kirillnovik8661 3 роки тому +3

    Very controversial, but interesting! I really appreciate that you are taking the time to examine more philosophical aspects of AI, certainly seems like a niche that needs work

  • @FourTwentyMagic
    @FourTwentyMagic 3 роки тому +8

    all you need is all you need

    • @MrChilledstep
      @MrChilledstep 3 роки тому +1

      Is all you need is all you need enough?

    • @underlecht
      @underlecht 3 роки тому +1

      not all you need is all you need

    • @danielalorbi
      @danielalorbi 3 роки тому

      not having all you need considered harmful

    • @zychen8945
      @zychen8945 3 роки тому +1

      No nesting

  • @jessemoeller8557
    @jessemoeller8557 3 роки тому

    I have often felt that motivation, and I suppose reward, are key things that are lacking in the current AI frameworks.

  • @MrAshtordek
    @MrAshtordek 3 роки тому +1

    I agree that their statement is a tautological, however I think it is important to distinguish a learning-capable agent from an intelligent agent. Just because an agent is able to learn, does not mean it will learn complex abstractions or generalizable ideas, it might just learn a few heuristics. I will give an example (don't read into the example to much ;p );
    When you are in school, if you just want to get good grades and do not care to learn generalizable knowledge, you might just memorize all the formulas in the math book and then when you leave school you have learnt absolutely nothing intelligent. However, if your teacher and book do not provide formulas for all the possible exam questions, you will need to construct the formulas for yourself (which I will just say an understanding, just remembering doesn't). The difference between those two scenarios is only that in the first scenario the "environment", as it pertains your reward, has become simple because to maximize your reward all you need is the list of formulas that already exist in a book, while in the second you have to *create* the solutions that maximize your reward from your environment.
    The reason that humans can exhibit both of these behaviours is not because we are innately intelligent, it is because we are a learning-capable agent. However if you subject an agent that is not learning-capable, for example a bacteria, to the exact same problem it will not be able to create any reward, because it cannot learn anything (or as someone else said "maximising the number of acorns for a bacteria would probably have a zero gradient to learn from").
    I will also point out that it is misleading to say that bacteria "solved" survival and fitness. I mean heck all animals/plants are partly descendants from some bacteria, where maximizing this bacteria's reward (fitness) lead to animals/plants. I think that is also their point to a degree, however I think they also meant it a bit stronger than: "Evolution is just maximizing a reward. Evolution has lead to intelligence. Maximizing a reward can lead to intelligence.". Also I think that is a bit nonsensical because evolution is *not* maximizing a reward, that is just a formalization that we have put down over it's head. Evolution is simply the fact that organisms which survive and multiply, will survive in greater numbers than organism that do not or do less. Evolution is not an algorithm, it is just an observation of how the world (has) functions(ed) in a given environment (the Earth).

  • @CesiumLifeJacket
    @CesiumLifeJacket 3 роки тому +1

    I think your bacteria criticism is a disanalogy: Intelligence in living things is very costly: it takes a lot of complexity and a lot of calories to be smart. Organisms balance that cost/benefit tradeoff as suits their ecological niche. Bacteria aren't dumb because they wouldn't benefit from intelligence, they're dumb because they're too small to implement the complex machinery required to be smart. But if a bacteria could suddenly have human-level intelligence for *free*, such a species of bacteria would totally dominate all life on earth. This "intelligence for free" situation is a better analogy: unless DeepMind starts penalizing their neural networks for how many FLOPS they use or something, more intelligence is basically free, so the marginal reward of more intelligence will almost certainly be positive.

  • @ChibatZ
    @ChibatZ 3 роки тому +3

    Very good critique points! Wouldn't have thought about them myself; inspiring! So Im now wondering if and how evolution can be seen as optimizing reward in an environment and should be considered part of the learning algorithm... Interesting....

  • @jabowery
    @jabowery 3 роки тому +1

    Taking the AIXI perspective on AGI (AIXI = AIT ⊗ SDT where AIT = Algorithmic Information Theory and SDT = Sequential Decision Theory):
    The bit string to be losslessly compressed by AIT (to its Kolmogorov Complexity) has to come from somewhere and that "somewhere" has to be, in some sense, "embodied" in its environment so as to receive data from its environment. The structure of its input will not necessarily take in _all_ data available in its environment. In fact, in all but trivially meaningless cases, it takes in a very small fraction of the data available in its environment. This implies a kind of "genetic lossiness" or biological SDT that is enormously lossy based on the evolutionary utility of its embodiment. This seems to me to stop at one level of regress; it's not "AIXIs all the way down". The level at which it stops is reproduction as utility function for evolution.
    So I think it is a mistake to treat bacteria as "unintelligent" in the AGI sense. What people seem to get confused by is the trend toward increasing abstraction with increasing "intelligence" and this kind of abstraction limits at pure AIT. What this means is that while it is true that the neocortex has all kinds of in-built, subjective "priors" relating to the evolutionary reward function of reproduction, it _does_ represent a _direction_ toward pure AIT (which might be thought of as "science" aka what "is" as opposed to what "ought" to be the case informed by our subjective values).

  • @alpers.2123
    @alpers.2123 3 роки тому +3

    Next paper: enough is enough

  • @Supreme_Lobster
    @Supreme_Lobster 3 роки тому +1

    I think the implicit assumption in this paper is that the underlying goal for any system that does have goals is to keep existing. And that the basic requirement for a system to have goals is to at least have the goal to keep existing (which is a bit of circular logic, but we don't really know how life started, and the best explanations we have are quite circular, ie: "by chance some chemical reaction happens the result of which is a simple system that acts to keep existing, and that simple goal creates more systems like it which have the goal to keep existing"). At least AFAIK

  • @drdca8263
    @drdca8263 3 роки тому

    So far (am only 12 minutes in so far) this sounds kind of like the idea of convergent instrumental goals / Omohundro goals, as is discussed in AI safety discussion, except without the "and here is how that could turn out badly" emphasis.
    18:19 : Ok, so, the paper argues that general intelligence, and basically all the useful properties we call intelligent, tend to be useful for maximizing most sufficiently complex reward functions in most sufficiently complex environments, but it does not seem to give an argument for why stochastic gradient descent or whatever in a reinforcement learning agent should be able to *become* generally intelligent, just that if it did, that would be useful for maximizing its reward?
    I don't really follow the "they are just defining intelligence as whatever it ends up doing" criticism. Rationality/intelligence should, I think, refer to those mental capabilities and behaviors which would allow an agent to act effectively towards the goals it has, across a large range of scenarios, and in ways that would also work for a large variety of goals.
    It seems plausible that for many sufficiently complex environments and sufficiently complex rewards, that a agent which achieves a sufficiently high reward, would likely be intelligent.
    Pretty sure this wouldn't apply to all of them though. One can imagine a complex environment which is optimized specifically so that some particular agent achieves the highest reward.
    (e.g. consider the agent which receives as input a bit-string, and produces the output/action of the sha2 hash of (the concatenation of the input with a particular long fixed random-looking constant), and consider an environment which, each round, sends random bitstrings to the agent, and then if the agent's response is what this particular agent would produce as output, it gives it a reward, and otherwise gives no reward. This agent achieves the best possible reward, but is not intelligent, and an intelligent agent, even if given an observation of the other agent's interactions with the environment and its rewards, would have a great deal of difficulty determining how to get any reward.
    Maybe one could argue that such an environment and reward is not "complex"?
    I'm not sure that's really the right thing to say though. The environment may not be complex/sophisticated (observations are simply random bitstrings), but the reward, I think, is complex. Well, I guess in the Kolmogorov-*like* sense of complexity, it might not be so complex, in that all that needs to be specified is the fixed salt concatenated with the input, the sha2 function, and the if equal, then 1, otherwise 0.
    Regardless, one could add complexity in the Kolmogorov-like sense which doesn't matter.
    While less well defined, and perhaps precisely because it is less well defined, I think a better thing to say is necessary than complexity, may be "richness" in a sense.
    A "rich" environment, with many components which have an abstract-able and not-hopelessly-chaotic structure to them, but which are non-negligibly coupled to each-other in many ways, and which is capable of reproducing many phenomena in ways that are both reasonably stable, but not so stable that they cannot be influenced, along with a reward which is connected to the environment in non-trivial but still ordered ways,
    should be one for which intelligence is particularly useful.
    I mean. a rich environment and a rich reward function, should be one in which there are many things that could be studied, and which one could make progress in studying these things, things that admit an easy first approximation for its behavior, and then progressively better approximations for its behavior,
    and where the understanding of these things enables action which exploits these behaviors, influencing how the things behave.
    (for example, "it is possible to pick things up, and drop them, and when they land, it can make a dent in the ground").
    Such a "rich" environment, is one in which I think intelligence would be useful.
    Are most "complex" environments "rich"? I am not sure.
    )
    24:09 : Hmm. Ok, yes, if the reward is "number of you", then bacteria does pretty well.
    However, suppose that there wasn't like, resource costs in order to be intelligent.
    Would bacteria not benefit from being intelligent?
    Like, suppose that each bacterium had a magic psychic antenna to something outside the physical universe, where for each bacterium there was a very smart intelligence with plenty of memory which could pick up on the signals the bacterium sent to the psychic antenna, and could send signals to the psychic antenna which directed the behavior of it (e.g. "make more of this protein. stop making this protein", etc. ), and that these psychic antennas cost very little for the bacteria to produce when reproducing and such.
    Would it not then benefit the bacteria to have these psychic antennae? I think it would!
    Bacteria aren't general intelligences, but if they could be for almost 0 cost, I think that would allow them to be able to replicate more effectively. (Perhaps they would end up binding together to form larger structures, that could then later re-separate, or something. They would at least be more effective, I think, in hunting and consuming other bacteria species. I'm pretty sure some single celled organisms do that.)
    Then, it isn't that intelligence wouldn't be helpful for bacteria, but that it isn't available and/or is too costly?
    27:45 : oh, they do say reinforcement learning, huh.
    Also, maybe when they are saying that the way that an agent can be capable of maximizing reward, could be by a kind of bootstrapping thing? Like, if it gets a high enough reward, then it must be such that it is capable of learning to maximize reward, so it can get a higher reward? idk.

  • @maxhennick5009
    @maxhennick5009 3 роки тому

    Great video, and I think that you raise many good points about their work. I would like to however bring up that in some sense, evolution does have a representation as a sort of "reward". It's sort of dumb, but essentially the reward function if evolution is the propagation of genes. This sort of idea was put forward in the book The Selfish Gene. Essentially, it just makes the observation that genes that are well-suited to being passed on, will be passed on, and those that are not will die off. In mathematical biology, we tend to refer to evolution as maximizing the fitness of a population. But, this is essentially just maximizing a reward, in a way. But, I think that you're correct in that there seems to be a lot of assumptions made in this work that are not strictly true.

    • @laptoptv9093
      @laptoptv9093 2 роки тому

      This is an inaccurate representation of evolution and The Selfish Gene. There is no reward as such, just that what we observe after running the "evolution simulation" is that agents who replicated themselves better (than competitors, and given the environmental constraints) are more prevalent.

    • @maxhennick5009
      @maxhennick5009 2 роки тому

      @@laptoptv9093 I mean, from a mathematical standpoint evolution attempts to maximize a measure of fitness. This is well understood in the literature.

  • @pensiveintrovert4318
    @pensiveintrovert4318 3 роки тому +2

    Actually, the Second law of thermodynamics is all you need. Increase in entropy is all you need for intelligence to evolve.

    • @cebedeo2918
      @cebedeo2918 3 роки тому

      I find this idea interesting. Could you please explain with more detail?

    • @pensiveintrovert4318
      @pensiveintrovert4318 3 роки тому

      @@cebedeo2918 At the basic level, nature reduces chaos in parts of a closed system in order to increase chaos much more in the rest of the same system.
      I can't point you to any literature, because there is none.

    • @cebedeo2918
      @cebedeo2918 3 роки тому

      @@pensiveintrovert4318 I see, so intelligence would spawn as a counterweight of sorts of some high concentration of chaos somewhere else?

    • @pensiveintrovert4318
      @pensiveintrovert4318 3 роки тому

      @@cebedeo2918 Hm. Not as a counterweight to chaos, but rather the most effective agents of faster/higher chaos. Living things seek energy at low entropy (high energy photons, for example) and turn them into low energy photons with higher entropy as the result. With lots of intermediate steps of course.

    • @cebedeo2918
      @cebedeo2918 3 роки тому

      @@pensiveintrovert4318 I think then you explained it the other way around, when you said that "At the basic level, nature reduces chaos in parts of a closed system in order to increase chaos much more in the rest of the same system." I understood there that the 'purpose' of nature would be increasing the chaos of the rest of the same system. But I see now that you meant that this increase in chaos is the cost of the reduction some element of the system has forced into itself, right?

  • @ChaiTimeDataScience
    @ChaiTimeDataScience 3 роки тому +3

    New video already!?
    I'll have to miss another Zoom now.

  • @florianhonicke5448
    @florianhonicke5448 3 роки тому +2

    Really interesting point you make!

  • @taku8751
    @taku8751 2 роки тому +1

    Evolution requires both mutation and rewards. But the paper is not about evolution but general artificial intelligence, so the background of evolution can be ignored. I think the paper is right. Reward is enough. It's just that Aristotle has discussed it a long time ago.

  • @Huxya
    @Huxya 3 роки тому

    Enters Moravec's paradox. LOL. Yes, "rewarding" part is easy. Try solving the "wanting" part... make something that feels pain, hunger or fear. It evolves itself to AGI in minutes.

  • @JensDoll
    @JensDoll 3 роки тому +5

    Well, there is a premium example that maximising for a really primitive reward/cost function may lead to really complex systems including intelligent agents: Evolution.

    • @billykotsos4642
      @billykotsos4642 3 роки тому

      Takes billions of years of evolution to arrive to humans !
      Hopefully quantum will help !

    • @DanielWolf555
      @DanielWolf555 3 роки тому

      Thats what I also thought. According to modern darwinistic believe there is no intelligent "creator" of man and beast. So Intelligence must have evolved merely by following a certain reward (the reward being survival in the case of evolution).

    • @eelcohoogendoorn8044
      @eelcohoogendoorn8044 3 роки тому

      Beat me to it. They could shorten this paper to: 'I believe in evolution.'

    • @JensDoll
      @JensDoll 3 роки тому

      And I must disagree with Yannick on his view on this. From a system complexity point of view there is not much of a difference between a bacteria and an animal. They are both incredible complex compared to unorganized matter.
      Yes, the bacteria is not intelligent agent (at least in our perspective) but it's still incredible to see what optimizing a simple cost function produces.

  • @CristianGarcia
    @CristianGarcia 3 роки тому +3

    I don't agree with the "bacteria solve the survival problem without intelligence" statement. I am not an expert on this field but if you read a little bit about cellular biology you will see that even unicellular organism are information processing systems, they have to pickup molecular signals from their environment and use them to take actions, there was a great video about a white cell => bacteria pursuit, they clearly perceived each other. You can even argue some of the molecular machinery that cells use internally is somehow intelligence since they perform tasks.
    If linear regression is intelligence I am fairly sure these systems are too :)

    • @TheRyulord
      @TheRyulord 3 роки тому

      Linear regression isn't intelligence and neither are bacteria. If we decide that anything doing anything resembling perception and information processing are intelligent then Roombas and video games characters are intelligent. This just isn't what people are talking about when they talk about intelligence.

    • @CristianGarcia
      @CristianGarcia 3 роки тому

      Intelligence is not a binary property, if it is for you you will fail to explain when it began, who was the first human or animal to be "intelligent", when in the evolutionary history did it appear? There is none, its a continuum.
      Also, there is a ton of research into intelligence outside of machine intelligence, please at least look at research from Santa Fe Institute before making such claims.

    • @TheRyulord
      @TheRyulord 3 роки тому

      @@CristianGarcia Lots of different people and labs have their own pet definitions for intelligence and we could use any one of them if we wanted to. The utility of definitions is that communicating certain ideas becomes easier. The definition you're using is 1) so overly broad that it's seems synonymous with information processing, making it completely redundant and 2) leaves us without a word for the particular type of processing that humans and some animals do, which is generally considered to be qualitatively and not merely quantitatively different from what a bacterium or a Roomba are doing.

    • @nahakuma
      @nahakuma 3 роки тому +1

      @Ryu Well, information processing is to a large extent what we "humans and some animals" do. What is missing? Optimization toward goals? Communication abilities? Learning? You may be surprised to know that bacterias display all such abilitites to a certain extent. Also, I'm sure you'd say there are people "smarter" than others. Why not apply that metric to anything? Why shouldn't a general theory of intelligence account for any system? Your way of thinking certainly is not compatible with physics, where general concepts such as length, energy, mass, etc., are used to understand all possible systems. The fact that you can assign an energy to anything certainly doesn't make the concept reduntant or overly broad.

    • @TheRyulord
      @TheRyulord 3 роки тому

      @@nahakuma I'm not saying that humans and other animals don't do information processing. I'm saying that intelligence is not *merely* information processing. If I want to talk about information processing I can and that's great but if I want to talk about the specific kind of information processing that most people call intelligence then it's convenient to have a word for that. I also never meant to imply I think intelligence is binary but that doesn't exclude the possibility of there being things which are totally non-intelligent. I might have a light in my house on a dimmer switch but "completely off" is still a state it can be in. I'm also not really sure where the comment about physics is coming from. If I say that some things are chairs and other things are not chairs, no one would accuse me of saying that my "way of thinking certainly is not compatible with physics" unless I concede that every physical system is chair-like to some degree.

  • @Kaslor1000
    @Kaslor1000 3 роки тому +3

    inb4 evolutionary algorithms will be the next breakthrough in ml

    • @kaniblurous
      @kaniblurous 2 роки тому

      I feel this is possible but not in the way we are familiar with. I am thinking that in fact PonderNet could be the next big step.

  • @mrpocock
    @mrpocock 3 роки тому

    Biologically all rewards are artificial rewards. This whole paper feels like motivated reasoning. Perhaps a more interesting idea is to thick of play as a hack to train up subsystems that otherwise would have too little sampling depth?

  • @alonkellner5375
    @alonkellner5375 3 роки тому

    About 23:05 up to 25:09 - I do not agree with this criticism, it seems to me that they do not assume that the *only* solution to any complex problem *is* general intelligence, but that the *best* solution *requires* intelligence.
    And still - even this proposition is not really valid, since general intelligence is mainly useful at the stage of growth in power of the agent, once an agent has reached ultimate power (humanity is gone, it can replicate, it collects all available energy, pebbles and anti-pebbles are created in intergalactic factories etc.) it no longer really needs general intelligence to operate, it only needs to know very specific tasks really well, since all other tasks are no longer beneficial.
    Maybe another assumption to be made is that the agent must have a way to improve and evolve beyond its current state for general intelligence to be useful.

  • @forcanadaru
    @forcanadaru 3 роки тому

    Nice hypothesis, if we want to retro-engineer the real life in hope that it will produce some kind of intelligence in the future, we need to remember that life (probably!) started from a chemical self-sustaining reaction that was changing environment so that this environment started to serve one goal - to support that reaction. Bacteria dominated our planet for around 4 billions years, and what about humans? One thing is soothing - we develop very fast?

  • @sebastienleblanc5217
    @sebastienleblanc5217 2 роки тому

    The pressure to crank out papers in science seems to prevent scientist from stopping and really deeply think about what they are researching. It may be one reason why we get papers filled with tautologies like this one..

  • @I_have_solved_AGI
    @I_have_solved_AGI Рік тому

    Seriously you need to talk about my impact maximization

  • @samdirichlet7500
    @samdirichlet7500 3 роки тому

    1) There’s plenty of evidence that animals are not rational actors. They don’t work to maximize rewards. Humans, for example, view loss and reward on different scales. For example, here’s a game player A is given $100 to divide between himself and player B. Player B decides whether or not the division is acceptable, if not each player gets nothing. If so, the players get A’s allocation. Player B’s maximizing strategy is to accept any allocation where he gets 1 or more dollars. What actually happens is if the allocation is not considered fair, B will reject and accept the loss.
    2) Silver, et al. been playing games and training spastic robots with deep net base RL for 6 years(?) now. I’m still not aware of practical applications outside of very rigid environments. For example, I’ve seen robotics guys dismiss reinforcement learning as too expensive and risky to apply practically. I’ve seen some neat demos but nothing anyone would want to deploy.

  • @Kerrosene
    @Kerrosene 3 роки тому

    For that bacteria example they say that "The agent has a limited capacity determined by its machinery"..which seems appropriate no? If bacteria had the same mental and physical machinery as humans..etc etc

  • @VincentKun
    @VincentKun 3 роки тому

    I have a thing to say about your criticism: When you say that we are a niche and the bacteria are another, you kinda differentiate between the two (or more niches). So you say that if the
    reward (eg: surviving) has to be maximized, in our species, there are a lot of complex behaviours related to the environment in which we live that emerges, but in the bacteria environment, there are other behaviours in which intelligence is useless we can say, (well there is no intelligence in producing protein, for example).
    So we need to weaken the statement that they propose to some other thing, more specific.
    In addition, I have still doubts about the "if you give the **collect pebbles maximize reward stuff**, the agent will get to Mars to collect pebbles"
    ps: Btw help the pebble-Apocalypse is getting real

  • @aspergale9836
    @aspergale9836 3 роки тому

    Risking to be _that guy_, I gotta ask: What exactly does this enable? I'll need to read the paper at some point (it's paywalled), but I'm just missing _the point_. What does their argument enable?

  • @zychen8945
    @zychen8945 3 роки тому

    If we have general intelligence then we will have higher reward. Even if this is true, does it mean that reward can sufficiently lead to intelligence?

  • @sandraviknander7898
    @sandraviknander7898 3 роки тому +6

    Yeah, maximising the number of acorns for a bacteria would probably have a zero gradient to learn from. Let’s say you also maximise for the number of individuals, then you would most likely never arrive at being a squirrel because squirrels are not as good at making a lot of individuals as bacteria’s are. So in the end you’ll need a equally complex reward function as your environment. At least that’s the only way I see how you would be able to maximise for the more complex rewards.

    • @soccerplayer922
      @soccerplayer922 3 роки тому +1

      Look at open-ended search. I think this direction provides a much more compelling way to approach these problems. I find the that a lot of people have a bit of magic in their RL model that just somehow promotes complexity without actually specifying how this occurs.

    • @sandraviknander7898
      @sandraviknander7898 3 роки тому +1

      @@soccerplayer922 thanks for the tip :)

  • @inostrantevia
    @inostrantevia 3 роки тому

    Bacterium doesn't just splits and thrives, it also needs to gather energy, material and be able to live long enough so it's intelligent. And it's unclear if it really gets rewarded, you can't assume thriving = reward.

  • @NextFuckingLevel
    @NextFuckingLevel 3 роки тому +3

    Big money is all you need

  • @jamiekawabata7101
    @jamiekawabata7101 3 роки тому

    Excellent discussion, and I agree. Reproduction of biological organisms as a feedback system can be considered a reward, but almost all systems produced in this way are not intelligent. Natural selection and sexual selection are extremely different (and sometimes opposing) forces, in competing for resources vs. competing for mates. I believe the latter is what has driven intelligence in humans. After the fact it is observed that intelligence is tremendously useful in dealing with nature but that is not the pressure that produced generally intelligent creatures.

  • @underlecht
    @underlecht 3 роки тому +2

    General intelligence is a consequence of evolution. Successful generation is a reward. And yes, the intelligence is tightly coupled with the environment. To simply simulate agent "getting intelligent" would require much stupid agent "deaths" until comes a generation adapting with the environment.

    • @Adhil_parammel
      @Adhil_parammel 2 роки тому

      Selection of female to next generation is another option.
      That's some algorithms to select efficient algorithm and replicate that efficient algorithms with slight deviation.

  • @DamianReloaded
    @DamianReloaded 3 роки тому +2

    In some sense this would imply there is a direct relation between the complexity of the intelligent system and the complexity of the environment. (what kind of complexity tho). But I think human intelligence was built in layers over the lapse of hundreds of millions of years, one improvement after another and each improvement set the foundations for the next one (software/hardware wise). Then all the different subsystems that make up human intelligence didn't come to be all at once but gradually, probably many times individually. Adaptations that solved a specific problem at a specific time and maybe got later repurposed. :/

    • @rentedbagel4956
      @rentedbagel4956 3 роки тому

      evolution is greedy, we can expect AI to be better since it is not filled with software/hardware legacy

    • @DamianReloaded
      @DamianReloaded 3 роки тому +2

      @@rentedbagel4956 Evolution works in numbers and "sticks" to what works best. Babies with big heads that are hard to give birth to, that can't walk or feed themselves for years doesn't seem like a greedy strat. Maybe our ancestors had an aquatic period and those adaptations then remained and got repurposed for the desert. Which is still more fuzzy than greedy imo. :/

    • @TheRyulord
      @TheRyulord 3 роки тому

      @@DamianReloaded Evolution is greedy in the sense that if an organism carries a mutation, evolution can't check to see if that mutation will be useful 100 generations from now. It can only check to see if it's useful in the current generation. Imagine there are three mutations that each individually lower an individuals fitness but when together cause a massive increase in fitness. If each of these mutations only occur one at a time, evolution will prevent any of them from sticking around long enough for all three to become established. Evolution gets stuck in local optima.

    • @DamianReloaded
      @DamianReloaded 3 роки тому

      @@TheRyulord I'm not sure about this. Evolution really doesn't "check" anything. Not all mutations that occur and remain are meant to be used. If they are random they just happen. If they are not detrimental they remain. It may happen that at some point generations later they prove useful. This would make evolution a process that spans across generations. So the "check" would not be at the moment the mutation occurs but at the moment the mutation doesn't help to reproduce. Individuals that survive and multiply carry all the mutations that occurred, not just the improvements.

    • @TheRyulord
      @TheRyulord 3 роки тому

      @@DamianReloaded There are of course plenty of mutations which have little to no effect on fitness and so their frequency in the population will just fluctuate randomly, potentially vanishing entirely or becoming ubiquitous; they don't necessarily just "remain". Evolution can't "look ahead" and see that a mutation that's currently neutral will be valuable many generations from now though so there won't be any selective pressure to make sure the mutation sticks around until then. The "check" happens every generation. Mutations that are currently beneficial will tend to go up in frequency, mutations that are currently detrimental will tend to go down in frequency, and mutations that are currently neutral will do a random walk. The future values of the mutations never play a role.

  • @HoriaCristescu
    @HoriaCristescu 3 роки тому +3

    It works out if you add self replication or winning rounds at the "evolution game" as a reward. Bacteria do adapt to their environment, they are as intelligent as is optimal for them to be.

  • @EsotericAI
    @EsotericAI 3 роки тому

    Abt. the ”Learning -> Maximizing Reward -> Intelligence -> Learning” -loop, or circle reference, it’s only a circle reference if the ”Learning”-value, ”maximizing reward”-value and the ”intelligence”-value are fixed. A small degree of Learning-ability perhaps must be available to learn how to maximize reward ”better” which in turn must be available to develop better intelligence which in turn must be available to better your learning abilities. The event that triggers the ”loop”, for humans, can be our preset structure at birth. So evolution isn’t part of the loop, it’s just responsible to create the setup for the loop to start spinning.
    There are obviously other systems, that are created by the evolution, that don’t require intelligence for it to survive. Looking at human-level general intelligence I strongly disagree that any ”part” or ability should be looked at as if they a distinct and separate from other parts, like one key part to all other parts, I instead suggests that they all are in a symbiotic relationship helping each other, so instead all are in a closed loop and depending on each other.
    In addition to that I think the paper’s biggest fail is be so vague about the ”complex environment”. I atleast strongly suggest to separate environment into two concepts: Environment that is the result of interaction and influenced by the agent, including the body of the agent itself, and the environment ”out of reach” of the agent. In some sence the distinction between the agent and it’s environment is less important than the distinction between the possible future environments (changed, altered or influenced by the agent actions) and the ”static” environment that the agent cannot have influence on.

  • @jonatan01i
    @jonatan01i 3 роки тому +6

    Reward is all you need

  • @wlorenz65
    @wlorenz65 3 роки тому

    0:40 Environments do not provide rewards. Bodies provide rewards. Other agents in the environments can sometimes provide rewards but most environments are toy environments which do not contain other agents.

  • @aniruddhadatta925
    @aniruddhadatta925 3 роки тому

    And then there is Decision Transformers but as it was pointed out that Decision Transformers only perform till the maximum possible reward in the dataset it was trained on ....is reward actually all we need

  • @charlesfeng3823
    @charlesfeng3823 3 роки тому

    Intelligence has at least 2 attributes:
    1. Autonomy, 2. flexibility
    In that, it is impossible to define it in language we have now. Once we expect AI to be explainable or expect AI to perform fully in accordance to our anticipation, their flexibility collapse.
    In this case, either their level of intelligence deteriorate or they failed to be deemed as intelligent.

  • @JorgeGonzalez-xt3jb
    @JorgeGonzalez-xt3jb 3 роки тому

    Why evolution is not a valid reward maximization mechanism? Aren't there genetic algorithms to optimize even deep nets? something like, fitness = total reward, could do (..maybe i didn't understand that point in the video )

  • @cheukkinpoon4428
    @cheukkinpoon4428 3 роки тому +1

    Was expecting a walk through of a good paper...

  • @boxeryy6661
    @boxeryy6661 10 місяців тому

    it seems like imitating evolution but it in itself is complicated how do you design the enviroment and reward to evolve an agent to an intelligent agent

  • @Markste-in
    @Markste-in 3 роки тому

    I would agree with the hypothesis if they would have weakened it a bit. From "intelligence will arise" to "intelligence can arise"... but who am I

  • @fast_harmonic_psychedelic
    @fast_harmonic_psychedelic 3 роки тому

    Also it requires language. No language means no intelligence. Language is the key. Rewards play a role too. None of these individual factors are "enough" on their own. But ultimately it doesnt matter- because the way it really works is how it will work, if they want to believe it was purely the reward that caused it then they'll just be wrong, but it will still be intelligent

  • @jonatan01i
    @jonatan01i 3 роки тому

    If you're able to understand that doing x in a situation is good, it worth nothing if you don't remember that the next time you're in that situation.
    Your hardware must be adequate for the things you need to be able to understand.
    But I guess they talk about the situation when you already have that? ?

  • @namesurname7498
    @namesurname7498 3 роки тому

    Hi Yannic, do you not think that maybe we can think of "survival" as the reward in the evolutionary process? Rather if natural selection is the process that drives evolution maybe death (or something like species extinction) is the negative reward? In this case fitness would be an emergent property of an agent that maximises his chance of survival (where an "agent" could even be an entire species??) that might be very dumb... I am nowhere near as qualified as you or anyone writing this paper so I would love to be corrected. Thanks

    • @drdca8263
      @drdca8263 3 роки тому

      treating death as the sole (negative) reward signal doesn't really work to explain the species which basically always die shortly after mating (like, cicadas iirc? quite a few kinds of bugs iirc.)

    • @namesurname7498
      @namesurname7498 3 роки тому +1

      @@drdca8263 that's true.. The reward somehow would have to be representative of the state of the species as a whole which might be impossible. Thanks

  • @raminbakhtiyari5429
    @raminbakhtiyari5429 2 роки тому

    😍😍

  • @jonatan01i
    @jonatan01i 3 роки тому +1

    Finite Resources Is All You Need

  • @sieyk
    @sieyk 3 роки тому

    Isn't this basically saying "Yeah, we can't figure out how to reduce an environment enough such that the loss function isn't noisy. Just let the RL thing do it all lol."
    The assumptions made in this paper are wildly optimistic. How would an RL agent even receive the input? You instantly get back to the same spot as before you started. The problem with AGI is not the inability of machines to do it, it's the reduction of noisy reality into perfect information that is the problem. We can see that RL can solve traditionally intractable games such as Go and DotA (limited), but the information in those environments is perfect or highly consistent.
    The main issue is the lack of a sufficient generalisation method for filtering natural data; you lose a huge amount of intellectual capacity when you need to account for noise. The problem is that reality is 99.9% noise and 0.1% information (likely less); whereas Go is 100% information and DotA probably has

  • @my3bikaht88
    @my3bikaht88 3 роки тому

    I like the idea that evolutionary driver is actually absence of any reward. No food, no adaptation, no reproduction etc and you're gone. Loss function makes more sense in this case bcs you just can't specify all possible rewards for complex systems.

    • @my3bikaht88
      @my3bikaht88 3 роки тому

      And defining reward (or set of rewards) looks to me like another attempt to answer the question about the meaning of life

  • @espadrine
    @espadrine 3 роки тому

    The core is missing, and it is too bad, because it was (somewhat implicitly) defined in the MuZero paper.
    The reason this can be bootstrapped is not reward, it is that actions taken based on a simulated model of the environment, yield a worse reward in the real environment. That gap being exploited to improve the model, is learning.

  • @ogito999
    @ogito999 3 роки тому

    I don't think general intelligence will be achieved through maximization or minimization.
    In fact I think that we won't be able to create a process which automagically generates AGI, before we already have AGI.
    Brain-machine interfaces are probably our best avenue to creating AGI.
    Being able to extract tons of raw data from the brain along with some intention-detection mechanism that tries to satisfy queries generated within the brain, would probably create something that's at least kind of smart.
    And such an intention-detector can be thought of as a very high level programming language that most anybody can use with just "common sense".
    And even that just speeds up the process to programming AGI, I'm not sure exactly how much concentrated effort it would take to create something "conscious" and autonomous even with such futuristic technological capabilities, but I'm guessing it's not going to be easy.
    To end my rambling, reward is almost definitely not enough, our current technological capabilities are probably not enough, our current philosophical understanding of "intelligence", "consciousness", "learning" are also probably not enough.
    These papers greatly trivialize the complexity of the task at hand, but I guess it's easy to say that we're almost there when we have no idea where we are.

  • @valkomilev9238
    @valkomilev9238 3 роки тому

    soo this paper says that loss = 1.00 bad and loss =0.00 good?

  • @Supreme_Lobster
    @Supreme_Lobster 3 роки тому +1

    If I understand it correctly, the argument is more about evolution than about intelligence. And we know that evolution leads to intelligence because here we are, and so are the 10000s of animals that are "intelligent", and we all are products of evolution so it seems like a self fulfilling prophecy (tautology, if you will).

    • @EsotericAI
      @EsotericAI 3 роки тому

      Well, only if you look at it like it’s a straight line of progress. It’s more different systems in relationship with each other:
      Evolution is a good system for generating systems.
      One generated system can be good at spawning systems that develop intelligence by reward maximization.
      So, .. the reward maximization -> intelligence is not required for the evolution-system to keep spawning systems.
      I disagree that the lack of reward maximization in the evolution itself should be used as an argument against that intelligence need reward maximization. The evolution process should be considered a parent system outside of inner systems that give birth to intelligence. It’s like saying it’s not the attention-mechanism that give the GPT-3 - model it’s NLP-skills because python 3 (or whatever programming language used when GPT-3 was trained) or the server hardware don’t need attention-mechanism to function.
      Evolution is a background-system that yes: is required, but no: should not be used as reference for how intelligence is born.

  • @pensiveintrovert4318
    @pensiveintrovert4318 3 роки тому +1

    The evolution is a classic RL system. What is the beef here? Bad designs die off, good ones multiply.

  • @KokosKeks
    @KokosKeks 3 роки тому +1

    Is this one of those joke papers you solicited to drive the acceptance rate down and it somehow got published anyway?
    It seems to me that the authors rediscovered instrumental goals without realising it.
    Claiming a squirrel is maximising food consumption seems awfully presumptuous. I think that it is possible to invent a reward function for any sufficiently complex agent so that it appears to maximise the given reward on a cursory inspection.
    That does not mean the agent is actually trying to maximise the reward or indeed any reward (for a useful definition of reward which restricts it to computable functions).
    Let alone that maximising such a reward would lead to general intelligence.
    Further given examples seem to fall in one of two categories:
    1. agents trained with an explicit reward (like Go) that don't seem to have developed much transferable knowledge. (Probably because they don't actually think with seperate specialized abilities regardless of how we classify their actions)
    2. and agents that exhibit general abilities (like humans) where a supposed training reward is ascribed without actually witnessing the training process.
    (Note that economists have been modeling human behaviour for decades as agents maximising their available money, but that the system breaks down on the individual level )
    Finally I want to say that even if our models become flexible enough to work with brute force reinforcement learning in the future (which might take a long time if the focus remains on just scaling up models) I don't think reward maximation like this will lead to training actually desirable methods for aproaching general tasks.
    But what do I know, I just post my speculation in youtube comments not in actual papers.

  • @chanel454879876354
    @chanel454879876354 3 роки тому +1

    I wish to merge rl with gan to train agent on seen reconstructed environment. And follow exploration as goal

    • @dexterovski
      @dexterovski 3 роки тому +1

      Schmidhuber did it first worldmodels.github.io/ (kinda).
      Also there are newer papers similar to this, look it up.

    • @chanel454879876354
      @chanel454879876354 3 роки тому

      @@dexterovski awesome! Exactly what i mean!

    • @dexterovski
      @dexterovski 3 роки тому +1

      @@chanel454879876354 you can also look up Dreamer V2