Building a GENERAL AI agent with reinforcement learning

Поділитися
Вставка
  • Опубліковано 16 тра 2024
  • Dr. Minqi Jiang and Dr. Marc Rigter explain an innovative new method to make the intelligence of agents more general-purpose by training them to learn many worlds before their usual goal-directed training, which we call "reinforcement learning".
    Their new paper is called "Reward-free curricula for training robust world models" arxiv.org/pdf/2306.09205.pdf
    / minqijiang
    / marcrigter
    Interviewer: Dr. Tim Scarfe
    Please support us on Patreon, Tim is now doing MLST full-time and taking a massive financial hit. If you love MLST and want this to continue, please show your support! In return you get access to shows very early and private discord and networking. / mlst
    We are also looking for show sponsors, please get in touch if interested mlstreettalk at gmail.
    MLST Discord: / discord
    00:00:00 - Intro
    00:01:05 - Model-based Setting
    00:02:41 - Similar to POET Paper
    00:05:27 - Minimax Regret
    00:07:21 - Why Explicitly Model the World?
    00:12:47 - Minimax Regret Continued
    00:18:17 - Why Would It Converge
    00:20:36 - Latent Dynamics Model
    00:24:34 - MDPs
    00:27:11 - Latent
    00:29:53 - Intelligence is Specialised / Overfitting / Sim2real
    00:39:39 - Openendedness
    00:44:38 - Creativity
    00:48:06 - Intrinsic Motivation
    00:51:12 - Deception / Stanley
    00:53:56 - Sutton / Rewards is Enough
    01:00:43 - Are LLMs Just Model Retrievers?
    01:03:14 - Do LLMs Model the World?
    01:09:49 - Dreamer and Plan to Explore
    01:13:14 - Synthetic Data
    01:15:21 - WAKER Paper Algorithm
    01:21:24 - Emergent Curriculum
    01:31:16 - Even Current AI is Externalised/Mimetic
    01:36:39 - Brain Drain Academia
    01:40:10 - Bitter Lesson / Do We Need Computation
    01:44:31 - The Need for Modelling Dynamics
    01:47:48 - Need for Memetic Systems
    01:50:14 - Results of the Paper and OOD Motifs
    01:55:47 - Interface Between Humans and ML
  • Наука та технологія

КОМЕНТАРІ • 47

  • @Ben_D.
    @Ben_D. Місяць тому +11

    I love the long format and high level context. Excellent.

  • @MartinLaskowski
    @MartinLaskowski Місяць тому +6

    I really value the effort you put into production detail on the show. Makes absorbing complex things feel natural

  • @CharlesVanNoland
    @CharlesVanNoland Місяць тому +9

    This is awesome. Thanks Tim!
    "If we just take a bunch of images and try and directly predict images, that's quite a hard problem, to just predict straight in image space. So the most common thing to do is kind of take your previous sequence of images and try and get a compressed representation of the history of images, in the latent state, and then predict the dynamics in the latent state."
    "There could be a lot of spurious features, or a lot of additional information, that you could be expending lots of compute and gradient updates just to learn those patterns when they don't actually impact the ultimate transition dynamics or reward dynamics that you need to learn in order to do well in that environment."

  • @redacted5035
    @redacted5035 Місяць тому +9

    00:00:00-Intro
    00:01:05 - Model-based Setting
    00:02:41-Similar to POET Paper
    00:05:27 - Minimax Regret
    00:07:21 -Why Explicitly Model the World?
    00:12:47- Minimax Regret Continued
    00:18:17-Why Would It Converge
    00:20:36-Latent Dynamics Model
    00:24:34-MDPS
    00:27:11-Latent
    00:29:53- Intelligence is Specialised / Overfitting / Sim2real
    00:39:39 - OpenendednesS
    00:44:38-Creativity
    00:48:06 - Intrinsic Motivation
    00:51:12 - Deception / Stanley
    00:53:56 - Sutton /Rewards is Enough
    01:00:43- Are LLMs Just Model Retrievers?
    01:03:14 - Do LLMs Modelthe World?
    01:09:49 - Dreamer and Plan to Explore
    01:13:14 - Synthetic Data
    01:15:21 - WAKER Paper Algorithm
    01:21:24 - Emergent Curriculum
    01:31:16 - Even Current Al is Externalised/Mimetic
    01:36:39- Brain Drain Academia
    01:40:10 - Bitter Lesson /Do We Need Computation
    01:44:31-The Need for Modelling Dynamics
    01:47:48 - Need for Memeetic Systems
    01:50:14 -Results of the Paper and OOD MotifS
    01:55:47 -Interface Between Humans and ML

  • @ehfik
    @ehfik Місяць тому +2

    great guests, good interview, interesting propositions! MLST is the best!

  • @NextGenart99
    @NextGenart99 Місяць тому +3

    Seemingly straightforward, yet profoundly insightful.

  • @Niamato_inc
    @Niamato_inc Місяць тому +2

    Thank you wholeheartedly.

  • @conorosirideain5512
    @conorosirideain5512 Місяць тому +6

    It's wonderful that model based RL has become more popular recently

  • @diga4696
    @diga4696 Місяць тому

    Amazing guests!!! Thank you so much.
    Human modalities, when symbolically reduced and quantized into language and subsequently distilled through a layered attention mechanism, represent a sophisticated attempt to model complexity. This process is not about harboring regret but rather acknowledges that regret is merely one aspect of the broader concept of free energy orthogonality. Such endeavors underscore our drive to understand reality, challenging the notion that we might be living in a simulation by demonstrating the depth and nuance of human perception and cognition.

  • @flyLeonardofly
    @flyLeonardofly Місяць тому +1

    Great episode! Thank you!

  • @Dan-hw9iu
    @Dan-hw9iu Місяць тому +13

    Superb interview, Tim. This is among your best. I was amused by the researchers hoping/expecting that future progress will require more sophisticated models in lieu of simply more compute; I would probably believe this too, if my career depended on it! But I suspect that we'll discover the opposite: the Bitter Lesson was a harbinger for the Bitter End. Human-level AGI needed no conceptual revolutions or paradigm shifts, just boosting parameters -- intellectual complexity doggedly follows from system complexity. More bit? More flip? More It.
    And why should we have expected a more romantic story? Using a dead simple objective function, Mother Nature marinated apes in a savanna for awhile and out popped rocket ships. _Total accident._ No reasoning system needed. But if we _intentionally_ drive purpose-built systems toward a mental phenomenon like intelligence, approximately along a provably optimal learning path, for millions of FLOP-years...we humans will additionally need a satisfying cognitive model to succeed? I'm slightly skeptical.
    The power of transformers was largely due to vast extra compute (massive training parallelism) that they unlocked. And what were the biggest advancements since their inception? Flash attention? That's approximating more intensive compute. RAG? Cached compute. Quantization? Trading accuracy for compute. Et cetera.
    If the past predicts the future, then we should expect progress via incremental improvements in compute (training more efficiently, on more data, with better hardware, for longer). We're essentially getting incredible mileage out of an algorithm from the '60s. Things like JEPA are wonderful contributions to that lineage. But if anyone's expecting some fundamentally new approach to reach human-level AGI, then I have a bitter pill for them to swallow...

  • @BilichaGhebremuse
    @BilichaGhebremuse Місяць тому +1

    Great interview

  • @lancemarchetti8673
    @lancemarchetti8673 Місяць тому

    Wow! This was awesome

  • @sai4007
    @sai4007 Місяць тому +1

    One important thing which world models bring in over simple forward dynamics model is learning to infer latent Markovian belief state representations from observations through probablistic filtering. This distinguishes latent state world models from normal MbRL!
    Partial observability is handled systematically by models like dreamer, which use a recurrent variational inference objective along with Markovian assumption on latent states to learn variational encoders that infer latent Markovian belief states.

  • @XOPOIIIO
    @XOPOIIIO Місяць тому +1

    I've missed it, why exactly it would explore the world? What's the reward function is?

  • @olegt3978
    @olegt3978 Місяць тому +2

    Amazing. We are on the highway to AGI in 2027-2030

  • @johnkintree763
    @johnkintree763 Місяць тому

    There is a concept of a Wikibase Ecosystem that could become a shared world model on which effective agent actions could be planned.

  • @willbrenton8482
    @willbrenton8482 Місяць тому

    Can someone link their work with JEPAs?

  • @maddonotcare
    @maddonotcare Місяць тому +3

    Impressive ideas and impressive endurance to hold that water bottle for 2 hours

  • @master7738
    @master7738 Місяць тому

    nice

  • @uber_l
    @uber_l Місяць тому +3

    Here I provide a simple AGI solution: reduction-(simulation-relation-simulation)-action. Simulation could last variably, for robots instantly using only accurate physics, for difficult tasks using increasingly complex imagination with rising randomness, think human dreams. Give it enough time and/or compute and it will move the world

  • @lancemarchetti8673
    @lancemarchetti8673 Місяць тому

    Imagine the day when an AGI agent can retain steganographic data within lossy image formats even after recmooression or cropping.

  • @GameShark02
    @GameShark02 Місяць тому +1

    what is up my homies

  • @RokStembergar
    @RokStembergar Місяць тому

    This is your Carl Sagan moment

  • @michaelwangCH
    @michaelwangCH Місяць тому

    The search problem is converted into minmax-optimization. But here is the problem, without training data of specific environment the max. regret of each action can not be defined - same as Maxcut problem, we can not know that the function we found is the best action we can take. To avoid the worst case in every action the agent will end up with model with mediocrity performance - all world model has to be turing complete, capable to deal all possible states. Therefore those model will not exist, especially in stochastical environments and the outcomes are uncertain.
    Conclusion: Minmax is mathematical problem which is still unsolved. Therefore their publication and talk are pure theoretical and they can not show empirically it works in reality with real data. To predict the latent state in RL is not new idea as well, those models are highly dependent on environments which the agent in. Only learn in representation in latent space resp. learn the absract concept of task without integration of environment the model will not generalize, poor performance.

  • @eliaskouakou7051
    @eliaskouakou7051 Місяць тому

    People are too preoccupied by one upping one another that they never ask : should we??

  • @paulnelson4821
    @paulnelson4821 Місяць тому

    It seems like you are going from a totally bounded training environment to “open ended” AGI. Joscha Bach has a multi-level system that includes Domesticated Adult and Ascended as a way to stratify human development. Maybe you need some kind of Bar Mitzvah or puberty to consider a staged development that would lead to general agency.

  • @johangodfroid4978
    @johangodfroid4978 Місяць тому

    not bad , far away from the final reward system of an AGI, I know how to build it for this reason I can say still a long way to do, reward system is so much more simple
    however really good episode and interesting people

  • @antdx316
    @antdx316 Місяць тому

    👍

  • @eliaskouakou7051
    @eliaskouakou7051 Місяць тому

    Intelligence isn’t about being optimised but let free. You can’t develop intelligence in a box

  • @Anders01
    @Anders01 Місяць тому

    My amateur guess is that AI models will start to learn by themselves to become more general, especially things like robots and IoT devices who can receive a lot of data from the physical world. In the beginning some hardcoded strategy by humans might be needed but after a while the AI models can start to optimize themselves, connected to computer clouds.

  • @geldverdienenmitgeld2663
    @geldverdienenmitgeld2663 Місяць тому +3

    the data does not come from humans. Data comes from the world. And if humans could gather the data, machines can gather it as well.

    • @johnkintree763
      @johnkintree763 Місяць тому

      Agreed. Language models can recognize entities and relationships, and represent them in a graph structure, which becomes the world model on which agents can plan actions.

    • @tobiasurban8065
      @tobiasurban8065 Місяць тому

      I agree with the intuition but reject the detached observer perspective on agent versus world. I would phrase it, the information for the system comes from the environment of the system, where the observer itself is again a system.

  • @rodneyericjohnson
    @rodneyericjohnson Місяць тому

    You see how making an AI model that seeks the unpredictable to make it more predictable leads to the end of all life, right?

  • @cakep4271
    @cakep4271 Місяць тому

    I'm confused about the synthetic data thing.. how could fake data ever actually be useful for learning something? How can studying fiction teach you about reality? Seems like it would just muddle what you learned from reality directly, with stuff thats not true in reality.

  • @johntanchongmin
    @johntanchongmin Місяць тому +1

    My answer: No, we can't. But we can build a generally intelligent agent within a fixed set of environments that can use the same pre-defined action space

  • @aladinmovies
    @aladinmovies Місяць тому

    AGI is here

  • @antdx316
    @antdx316 Місяць тому

    AGI being able to figure out what you need to happen before you can figure it out yourself is going to require the world to have a working UBI model soon or else.

    • @awrjkf
      @awrjkf Місяць тому

      We need to start working on UBI model now. I am also saving to buy a piece of land for farming. I think we all should. No matter where the land is, as long as it is fertile. Because no matter what happens to the economy, as long as we can sustain ourselves, it would be a good safe guard to survive.

  • @dg-ov4cf
    @dg-ov4cf Місяць тому

    nerds

  • @Onislayer
    @Onislayer Місяць тому +17

    Optimizing towards a nash equilibrium still won't be generally intelligent. the intelligence in life that has pushed humanity forward is very much at the extremes not some game theoretic optimal objective. "Innovation" through optimization is lazy and uninspired .

    • @RanjakarPatel
      @RanjakarPatel Місяць тому +3

      This incorrectly my dear but I am so proud four you’re try. Everyone’s need four improve branes four become expertise like four me. I am number computer rajasthan so please take care four you’re minds four acceleration educating

  • @Greg-xi8yx
    @Greg-xi8yx Місяць тому

    This isn’t even up for debate anymore. The only question is: is it 1 or 5 years away?

  • @andybaldman
    @andybaldman Місяць тому

    This all seems like really complicated ways of saying very simple things, which these models are not going to fully solve. The models are all too simple. No matter how they are architected, as long as they are made of non-agential parts, they will always be brittle.

  • @rcstann
    @rcstann Місяць тому +6

    ¹1¹st
    I'm sorry Dave,
    I'm afraid I can't do that .
    🔴
    .