Why next-token prediction is enough for AGI - Ilya Sutskever (OpenAI Chief Scientist)

Поділитися
Вставка
  • Опубліковано 5 вер 2024
  • Full episode: • Ilya Sutskever (OpenAI...
    Transcript: www.dwarkeshpa...
    Apple Podcasts: apple.co/42H6c4D
    Spotify: spoti.fi/3LRqOBd
    Follow me on Twitter: / dwarkesh_sp

КОМЕНТАРІ • 510

  • @claymarzobestgoofy
    @claymarzobestgoofy 8 місяців тому +166

    Dude your titles are always so misleading. You seem to put words in those famous people's mouths that they never said. Like here the quote you attribute to Illya is never said by him, and 3blue1brown also never said "too many software engineers"

    • @vaaal88
      @vaaal88 4 місяці тому +8

      Thanks, I'll not watch this then and tag it as "not interested" in the channel.

    • @ryzikx
      @ryzikx 4 місяці тому

      how

    • @Christian_Luczejko
      @Christian_Luczejko 4 місяці тому +3

      You replaced the dislike button for me

    • @nas8318
      @nas8318 4 місяці тому +3

      You could argue it's a human "hallucination"

    • @ArielChelsau
      @ArielChelsau 4 місяці тому +2

      It's not the first time he does that. That's called clickbait.

  • @EdFormer
    @EdFormer 8 місяців тому +74

    Ilya Sutskever, supposedly the smartest guy in AI, making the argument that correlation implies causality. If there was any truth to the point that the next token prediction objective can lead to internal models of causality (fundamental understanding of the process that generated the tokens) then the hallucination problem wouldn't exist. It exists because a given string of words can be statistically likely, with respect to a measure of conditional word frequency on existing text, even if the statement it makes is provably false. Internal models of causality, on the other hand, would prevent it. More broadly, if next token prediction was enough for AGI (assuming that had a concrete definition/benchmark and wasn't just an infuriatingly misleading buzzword), it would be solving far more problems than just simulating human generated text - it doesn't work for video, let alone self-driving or robotics.

    • @DavidUrulski-wq9de
      @DavidUrulski-wq9de 8 місяців тому +17

      Wrong. You can't say hallucinations represent a complete lack of learning, especially when the better models are getting increasingly good at not hallucinating. That's just a lack of training data, not a problem with the method. Humans hallucinate answers due to a lack of training/learning. Some humans have more "training data" than others, and it's the same case for these models.

    • @EdFormer
      @EdFormer 8 місяців тому +26

      @@DavidUrulski-wq9de who said that hallucinations represent a complete lack of learning? Nice attempt to strawman 👏 they absolutely _learn_, my point is that what LLMs learn is to predict the next token based on what is statistically likely, with respect to a measure of conditional word frequency on existing text, rather than an internal model of causality that provides fundamental understanding of the process that generated the text as Ilya poorly attempts to counter here. Your misrepresentation and lack of engagement with the mechanistic argument says a lot about why you might come to the conclusion that we just need more data. Before I go there though, what models have been getting increasingly better at not hallucinating? GPT4 is still the best and it hallucinates like mad. But onto the idea that we just need more data, please - it would take a human thousands of years to read everything that GPT4 has been trained on. And humans don't hallucinate like LLMs at all, don't be silly now. Why don't we, though? Because we base the words we generate on a plan to communicate what we want to communicate that is internally formulated through a mental simulation of the consequences of what we communicate and how well those consequences fit with our internally formulated objectives, all of which is based on learning from actually experiencing the world and not just reading about it.

    • @DavidUrulski-wq9de
      @DavidUrulski-wq9de 8 місяців тому +12

      @@EdFormer GPT4 doesn't hallucinate all that much. It does. But not the extent that you are trying to make it out to be. Humans hallucinate far worse than chatgpt does, so your other argument is out the window. The majority of us still believe there is some guy in the sky that looks like us and we are gunna die and fly up into the sky with angel wings to live eternally. This is purely because of what we have been taught by others, which is bad training data. You are trying to say that the LLM logic is flawed to the point it can't represent human intelligence or better.. which just has no basis in reality. It is already far beyond most humans for many things. Your mention of "causality" is just bullshittery at best. Humans themselves are just token predictors. Our entire way of communicating is learnt the exact same way as an LLM learns, and much of our intelligence is based off our language. That is why when we think, we are often speaking inside our heads. From birth we learn words from other people saying them, we learn the relation between the words and how they can be used together to form a meaningful/good sentence, this is the exact same thing LLMs are doing. Our brains are super computers so they are better than us at doing it with machines, not in breadth but in specifics(for the time being) Regardless they are not special, they are not run off some divine method that only humans can have. We are token predictors. Our brains are token predictors. It's how we are designed to keep alive to make sure we know what comes next with great certainty.. if the bushes make a sound, it could be a tiger or the wind, we use context to know what's likely to come next. It's the data behind the causality. It's what we have learnt based on that training. If we have seen 100 tiger attacks in the last week. We know it's probably a tiger. If the wind has been blowing for a week and no tigers seen, its likely not a tiger. Data and learning is the only way get causality. It's not some god given gift. It's a method.

    • @EdFormer
      @EdFormer 8 місяців тому

      @@DavidUrulski-wq9de The importance of causality for intelligence is bullshittery at best? I fear you're a lost cause, mate. But let's have a go. Religion might be fantasy, but it's a fantasy that has been created by an innate need in humans for causal explanations. It is a failure of reasoning rather than a hallucination - these are fundamentally different things. So no, my argument stands firm. Causality is how we make sense of the world, it is the basis on which we determine what action to take, plan our future, etc etc. Did you know that the regions of the world that haven't seen war for the longest period of time are typically the regions with populations of wild rabbits? Your view of intelligence would suggest that we should know from this that rabbits cause peace. But, of course, they dont, and I know that because my internal causal model of the world tells me that there is no mechanism that would allow for wild rabbits to influence geopolitics, despite this correlation. Even Ilya makes clear the importance of this fundamental principle of intelligence by arguing in this video that this could be done within the network in its process of predicting the next token. I just see no mechanism for that. As for language, you do realise that such a complex means of communication is unique to humans, don't you? And you do realise that other species of animals, including pre-merge function humans, are/were also intelligent? Language is just icing on the cake, and we would be in a more promising position if we had something that was cat-level intelligence that couldn't communicate with language than something that can communicate with language but that does not have an understanding that is grounded in the real world (and, hence, hallucinates). Honestly, mate, you've been duped, we are not just next token predictors. We observe the world, consider potential sequences of actions, simulate their consequences based on our internal causal models, assess how well the consequences fit with our objectives, optimise the sequence of actions accordingly, and pursue the best solution we can come up with in the time we have. I'm not implying that this is divine as you oddly suggest, I'm saying that human intelligence would be much better modelled with a new architecture that allowed for these abilities - next token prediction doesn't allow for this (unless Ilya is right about it taking place within the mapping from the context to the prediction, which I argue the hallucination problem proves is false - you still haven't addressed the mechanistic argument, interestingly). Why do you think there is the hype about Q*, an architecture that supposedly incorporates AlphaGo-style tree search? AlphaGo and its derivatives clearly model intelligence in a far more human-like manner. Finally, I am saying that the LLM logic is flawed and my argument has every basis in reality. You say that they are far beyond humans in many things, but where are LLM powered self-driving cars? Where are LLM powered robots performing manual labour jobs better than humans? Where are the LLM scientists revolutionising their fields? Where are the LLM artists coming up with new art forms that change the way we think? The list goes on. Seriously, mate, stop drinking the kool aid and step off the OpenAI hype train before it inevitably hits the wall.

    • @brandongillett2616
      @brandongillett2616 8 місяців тому +6

      First off, hallucination rates absolutely have been decreasing with newer models such that gpt4's grounded hallucination rate is about 3% (previous models were around 20%)
      Second, humans hallucinate plenty. That is what optical illusions are. That is what false memories are. That is what seeing shapes in the clouds is.
      Third, you make the assertion that if an AI had an understanding of the world, it would not hallucinate, but there is a very good reason this would not be true. Let's say that during it's training, the AI encounters a piece of data that it's world model is not sophisticated enough to accurately predict. What should it do? Should it say that it does not know and move on? No, because that does not minimize it's cost function. Instead, the best way to minimize it's task function when it does not know something is essentially to "guess". That way it at least gets points because the structure of the sentence is similar, and chances are that the semantic similarity of whatever it's made up answer was will be close to the real answer, thereby minimizing it's cost function. That wouldn't imply that the AI does not have a world model, it only implies that it is trained to guess when it does not know the answer.

  • @Shawnecy
    @Shawnecy День тому +1

    "they have thoughts and feelings and they have ideas and they do things in certain ways, all of that can be deduced from next token prediction".... yeah, right after you find a way to have all those things transcribed so the LLM can deduce them. This is currently not the state of reality and may never be and seems to be the long pole in the tent of LLM AGI.

  • @jebprime
    @jebprime 8 місяців тому +28

    Next-token prediction may be enough,
    but only by requiring a lot more computation/parameters than other methods.
    That's what I've noticed based off chatgpt.
    Bigger models + better sampling techniques
    -> better next token prediction.
    But is it the most optimal way? Probably not.
    Just requires less thinking

    • @rickevans7941
      @rickevans7941 8 місяців тому +2

      Profound insight JEB lol

    • @jebprime
      @jebprime 8 місяців тому +1

      @@rickevans7941 Thanks! yeah, it makes sense as well if you think about computational capabilities.
      If a model is bigger (it has more parameters), it can handle and analyze more complex patterns with a fixed input token and has a strict output length. At least compared to smaller models.
      But it's still limited by the fact that it has to spit an answer within that output length.
      What if it could ponder and wasn't restricted to that output length. That's the way we approach problems as people.

    • @eidiazcas
      @eidiazcas 8 місяців тому +3

      correct, cognition will be an emergent property hidden in the network, very inefficient though

    • @brandongillett2616
      @brandongillett2616 8 місяців тому

      Quality thought is an expensive resource. Sometimes the most optimal way is the way that requires the least thinking. Especially in this case where if you scale up large enough, quality thought gets cheaper and cheaper.

    • @lucasparisi7394
      @lucasparisi7394 4 місяці тому

      but then its only limited by computation and scaling laws, not another topology or ML paradigm.

  • @eoghanf
    @eoghanf 8 місяців тому +176

    I'd just like the future AGI to see that I posted the second comment on this video. Please spare me.

    • @ipurelike
      @ipurelike 8 місяців тому +10

      not good enuf

    • @user-lv9gm3fe6j
      @user-lv9gm3fe6j 8 місяців тому +2

      Rokos basilisk makes no sense it would just be a simulation (digital clone) of you. Do you think that if a biological clone of yourself gets tortured you get tortured? Makes no sense.

    • @saliverdis
      @saliverdis 8 місяців тому

      And I liked it! So maybe also spare me to.

    • @fullmentalalchemist3922
      @fullmentalalchemist3922 8 місяців тому

      ​@@dot1298pretty sure it was a joke

    • @RealShinpin
      @RealShinpin 8 місяців тому

      Dammed rokos...

  • @UFOgamers
    @UFOgamers 8 місяців тому +44

    I love how people say 'it's just statistics' while our universe itself is a statistical machine. Fundamental particle such as electrons are literally a probability distribution, and become an observation/particle when observed. It's truly the definition of a random variable.

    • @Art-is-craft
      @Art-is-craft 8 місяців тому +4

      Right now it is just statistics with power application. Once such a system is present within the human environment it will look much different.

    • @vectoralphaSec
      @vectoralphaSec 8 місяців тому +5

      Right. The universe, the world, reality itself is regulated by statistics. So therefore it should be in the equation to creating intelligent artificial life.

    • @hanhai8515
      @hanhai8515 8 місяців тому +1

      @@Art-is-craft there is no "just statistics", statistics is the most advanced interpretation of the universe.

    • @Art-is-craft
      @Art-is-craft 8 місяців тому +6

      @@hanhai8515
      Statistics without a systems models means nothing. Just because you can count the patterns of a dripping tap does mean you understand why the tap is dripping.

    • @Hexanitrobenzene
      @Hexanitrobenzene 8 місяців тому +8

      ...I am hearing Sean Carroll saying "No, actually electron is an oscillation in a quantum field and the probability distribution (over its possible states) you mention is just a mathematical description which fits this model the best."

  • @ploppyploppy
    @ploppyploppy 4 місяці тому +3

    Can you give us the timestamp where he said 'next-token prediction is enough for AGI'? No? Blocked.

    • @artukikemty
      @artukikemty Місяць тому

      He says from the beginning second 0:02 "Next token prediction can also pass human performance", assuming human performance is AGI

  • @maxziebell4013
    @maxziebell4013 8 місяців тому +20

    In another insightful audio-only interview, Ilya Sutskever drew a parallel between the role of a Large Language Model (LLM) in foreseeing the climax of a one-of-a-kind murder mystery and the complexities of advanced AI reasoning. He elaborated that for the LLM to accurately deduce the identity of the murderer at the tale's end, it must engage deep nested connections and logical relations, demonstrating an understanding that goes beyond simple task completion. This analogy helped me appreciate the intricate states of such models and realize that underestimating their capacity to possess an ever refining “world model” would be shortsighted.

    • @joeroganpodfantasy42
      @joeroganpodfantasy42 8 місяців тому

      what podcast was that

    • @maxziebell4013
      @maxziebell4013 8 місяців тому +5

      @@joeroganpodfantasy42 it was "What, if anything, do AIs understand? with ChatGPT Co-Founder Ilya Sutskever" by Clearer Thinking with Spencer Greenberg

    • @joeroganpodfantasy42
      @joeroganpodfantasy42 8 місяців тому

      thanks found it the exact quote ua-cam.com/video/NLjS1UOr8Nc/v-deo.htmlsi=m1wtu40O0L3QMq4A&t=349@@maxziebell4013

    • @joeroganpodfantasy42
      @joeroganpodfantasy42 8 місяців тому

      Its amazing how simply he put it compared to the big words you used which hide the meaning .
      Also it wasn't a murder it was just a mystery novel , I found that funny

    • @maxziebell4013
      @maxziebell4013 8 місяців тому +1

      ​@@joeroganpodfantasy42 Are you referring to this podcast, or the one I recommended earlier? Regardless, each person processes information uniquely. Communication is like intersecting circles between the sender and receiver. This simply reflects my tendency to contextualize what I hear. 😊

  • @johnbrown7714
    @johnbrown7714 8 місяців тому +68

    During my first listen to this interview this pov from Ilya hit me so hard. So profound

    • @mlock1000
      @mlock1000 8 місяців тому +3

      He really appears to have a deep and clear understanding of all this.

  • @rajeshktrivedi
    @rajeshktrivedi 8 місяців тому +9

    To test Ilya's hypothesis that the "Next-token prediction" can extrapolate to new insights and concepts, let's conduct a thought experiment. Imagine an LLM that has been trained on everything but on invention of transistors and semiconductor industry from 1948 onwards. Would it be able to extrapolate and answer intelligently about the current 3nm semiconductor systems?

    • @karuonline3294
      @karuonline3294 5 місяців тому +1

      Yeah, i don't think so.. cuz those answers wouldn't even feature in the realm of possibilities..the idea that prediction of what a super intelligent hypothetical being would do is possible based on data of regular folks seems flawed in a similar way..

    • @harbirsingh7266
      @harbirsingh7266 5 місяців тому +3

      I'd argue that it'll be able to, not immediately just from the trained data, but eventually. Think of LLMs as the human civilization. Humans did not immediately go from the invention of fire to semiconductors either, but instead the knowledge was built on top of the known facts. Once LLMs can be trained to do science, they'll progress much faster than humans can, which is how they'll surpass humans.
      TLDR: they'll be doing the exact same thing as the human civilization, but faster.

    • @rajeshktrivedi
      @rajeshktrivedi 5 місяців тому +1

      @@harbirsingh7266 Can an LLM offer "Improbable Explanations" for scenarios where there is no data? For instance, can LLMs generate theory of Time Dilation without any human being ever expressing any need for it nor there being any data for this conclusion till 1915.

    • @FunkyJeff22
      @FunkyJeff22 4 місяці тому

      Could humans even do that?

    • @Wagner-uv6yp
      @Wagner-uv6yp 4 місяці тому +3

      @@harbirsingh7266 I think ppl confuse AGI as "all-knowing" but it should be viewed more as "highly capable and generalizable". I believe advanced statistical prediction could create a true reasoning AI but we are not there yet.

  • @luckerooni1153
    @luckerooni1153 8 місяців тому +23

    "Surpassing human performance" to me is not the same thing as AGI in the first place. Depending on the human, depending on the task, depending on the model's data, it can predict a correct result better than a human of some relative level of skill. That's already a thing. And in fact that's been a thing since before a neural network was used, just in video games AIs regularly "surpass human performance" and in many ways you could say beating a video game is "surpassing a developer-gated AI performance level that could be even more difficult but it wouldn't be fun." And in fact the very idea of "extrapolation" which is the idea of making sub-hypotheses outside of what's known based on data is one of the very things next-token prediction cannot do that the "general intelligence" of humanity can. Just as he says "It's not statistics... well it is, but..." the NN is still stuck within the framework of already gathered data-points and it doesn't have a good system for cleaning up data and that and many other limiting factors hold it back.
    It's hard to take people serious when they say this stuff is spooky scary skeletons when they are the ones making it and can't even admit to the actual limitations and they themselves think they're building a science-fiction species of digital lifeforms and not networked statistical prediction models. You're not building a digital baby, you're just algorithmically condensing data better than we did it before. It's not as crazy as it seems and because of that it's also not as dangerous as it seems.

    • @BlueMan-ld3ef
      @BlueMan-ld3ef 8 місяців тому

      Someone finally said it!

    • @eidiazcas
      @eidiazcas 8 місяців тому +5

      you must be a greater genius than Ilya and all the other incredibly smart people that agree with him, so we should listen to you instead of the ones who actually achieved the greatest advancements in AI, what papers have you written? or maybe you're just talking out of your *

    • @aartemiswhite
      @aartemiswhite 8 місяців тому +5

      Gather everybody we have an expert here who has these models completely figured out. Forget the countless scientists and engineers in this multi-billion-dollar industry scratching their heads trying to figure out how these models have emergent properties and are getting better with scale. Oh, and even if you did know what you are talking about you make it seem like somehow compressing petabytes of training data into a dozen gigabytes is some run of the mil algorithm. An algorithm that can communicate fluently on subjects it hasn't been trained about mind you.

    • @lolololo-cx4dp
      @lolololo-cx4dp 8 місяців тому +1

      ​@@eidiazcas Oh my god, it's not like false advertising is new things, even google just did that recently on their Gemini ad and technical report paper. You can always read, instead of just believing.

    • @eidiazcas
      @eidiazcas 8 місяців тому +1

      @@lolololo-cx4dp I don't just believe, I think that all previous emergent properties from AI are proof enough to agree with Ilya, also he is not a marketing guy, he's a researcher, and one of the greatest in the field

  • @pauledam2174
    @pauledam2174 8 місяців тому +1

    How would a person who really understands what AGI is about and who has great knowledge, capability, and wisdom evaluate this conversation and the argument made here? answer by gpt: 1. Understanding Human Performance: They would acknowledge the challenge in surpassing human performance in next-token prediction. Human cognition is not just about predicting the next word but involves understanding context, emotions, cultural nuances, and creativity. 2. Limitations of Imitation: The point that neural networks primarily imitate human behavior is crucial. AGI involves not just imitation but the development of original, creative thought processes, something current models are limited in achieving. 3. Understanding vs. Prediction: The conversation touches on a critical point - understanding the underlying reality is not the same as predicting outcomes. Current AI models are adept at statistical predictions but understanding the 'why' behind these predictions is more complex and is a key area for AGI development.

  • @mp9305
    @mp9305 4 місяці тому +32

    Yann LeCun dedicates all his time now to fighting this madness 😂

  • @over9000andback
    @over9000andback 8 місяців тому +12

    Token prediction is one thing, but AGI needs to be object aware beyond tokens, and have transferable knowledge about these objects. Right now if a LLM is trained that A = B it does not understand that B = A. This is because it just sees A and B as tokens and can’t predict what it hasn’t learned.

    • @DavidUrulski-wq9de
      @DavidUrulski-wq9de 8 місяців тому +1

      Not really. It can learn that B = A even if it hasn't seen that before quite easily, it just looks at many other examples where it's a similar case and infers it. This is the reason you can generate unique text that has never been seen before. To take coding for example, the reason you are able to ask it to write code to print( "blabla69420jibbajabba") to the screen isn't because it has seen that exact statement before, but because it has learnt from the training data how to print any string. and how to generate the next token in a way that will take the context of what it's learnt and the context of what the prompt is to shape its output.. regardless of if it has seen that specific case before.

    • @DavidUrulski-wq9de
      @DavidUrulski-wq9de 8 місяців тому +1

      This is the reason that the LLM makers when doing their evaluations try their best to remove the evaluation tests from their training data.. because we are only really interested in results that it can infer from learnt behaviour, not simply repeating training data.

    • @DavidUrulski-wq9de
      @DavidUrulski-wq9de 8 місяців тому

      Me: If hdudirjr equal to jficueoe, what is jficueoe equal too?
      Chatgpt: If "hdudirjr" is equal to "jficueoe," then "jficueoe" is equal to "hdudirjr." In other words, they are equivalent to each other.

    • @DavidUrulski-wq9de
      @DavidUrulski-wq9de 8 місяців тому

      With above, it has learnt based off the context what is correct. It has never seen the random words before, it has just seen the surrounding parts and inferred the correct way to answer. This is learning behaviour.

    • @DavidUrulski-wq9de
      @DavidUrulski-wq9de 8 місяців тому +1

      This is the same way humans learn. We are basically just token predictors ourselves. From being born we learn one word, then two, then three, then we learn to use them words together based on context. That's basically what language is, we don't just have some divine spirit within us that generates our speech, we have learnt from what other people have spoken(training data) and we predict which word should come after which other word to get our message across. Obviously our brains do this very fast without us even knowing but it's the case.

  • @spense
    @spense 8 місяців тому +24

    Hard disagree. The output is just what average people think someone smart would say. We're not getting AGI with LLMs alone.

    • @nanotech_republika
      @nanotech_republika 8 місяців тому

      good! we also need your kind of people in the future

    • @nastied
      @nastied 6 місяців тому

      You forgot about emerging properties. If you spend enough time With GPT4 , you see that is reason better than most average people. Yes chat gpt does mistakes but so do humans , a quite a lot of them actually

    • @RhumpleOriginal
      @RhumpleOriginal 4 місяці тому

      They are not using the LLMs in a way that would allow it. It should be possible, at least if what I and a friend are working on proves true.

  • @skit555
    @skit555 8 місяців тому +16

    "My machine to project shadows from puppets will eventually make shadows so accurate that surely the puppets behind will be equivalent to the Real World"

    • @skit555
      @skit555 8 місяців тому +6

      I know it's not exactly his point as he's after a swarm of agentic LLM, and it's clearly a powerful ability of fine-tuned LLM; but it doesn't make it AGI, it misses the deeper cognitive structure to process information as we do and not just imitate part of our outspoken behavior.

    • @mygirldarby
      @mygirldarby 8 місяців тому +6

      ​@@skit555 our brains are prediction machines. This is what people don't seem to understand.

    • @skit555
      @skit555 8 місяців тому +1

      @@mygirldarby Do you really believe our brains are exclusively prediction machines?

    • @skit555
      @skit555 8 місяців тому

      @@tooilltb I guess my understanding could improve on the technicalities, why?
      I didn't get the second part of your comment but I googled "epulis" and am now disgusted.

    • @mikebarnacle1469
      @mikebarnacle1469 8 місяців тому +2

      ​@@mygirldarby That's one part of it. We also have ingenuity to come up with novel ways to change the future. We ponder and ask "what if" and I'm not sure how you can train that into an LLM. We also explain things, make hypothesis and test them, which isn't really prediction itself it just uses prediction as a next step in testing a conceptual model of the world that we imagined. There's really absolutely 0 of that in any LLM today so it doesn't seem to me like scaling up is suddenly going to unlock those capabilities.

  • @margaretadovgal1677
    @margaretadovgal1677 8 місяців тому +24

    The way I describe generative AI to people new to it is generally that we have created many systems of meaning that reflect reality. They exist in many bodies of work, across disciplines, but they are not a perfect reflection, because these systems are filtered through our perceptions, then conceptions, and then into written or other symbolic forms, where they are embedded as a systems of meaning. AI learns to pick-up the common threads that underpin that system, and extrapolate from them. It doesn't have to be self-aware to reflect back onto us our own understanding, or to even surpass it. If that proxy for deep understanding is reasonably consistent with what people themselves would produce, across domains, or exceeds it, then we've attained AGI.

    • @EskiMoThor
      @EskiMoThor 8 місяців тому +8

      I suspect you are right about self-awareness, it probably isn't needed for attaining more intelligence, but I am more doubtful that other differences between how our brains work and how generative AI currently works won't make a bigger difference.
      Or brains are complex self-organizing dynamic systems, embedded in complex self-organizing dynamic systems (body), which interacts with complex self-organizing dynamic systems (interpersonal interactions, family, community, society), while AIs currently are relatively complex they do not behave as self-organizing dynamic systems, they do not update their internal models themselves, they don't integrate new information, new relationships, new situations.
      Time, effort, risk have no impact on them, they interact linearly with the world while our brain interact in non-linear recursive ways.
      I don't know how this works, though, I just describe it as I see it, so I give it a 50-50 chance that Ilya is right, predicting the next token could be enough, but it could also be missing something essential.

    • @MCA0090
      @MCA0090 8 місяців тому +1

      ​@@EskiMoThor there are recent Liquid neural networks that have plasticity, they can change the connections between neurons and adapt and learn while doing tasks... those networks are very good to deal with vision, images, videos and time and they are very tiny compared to other types of NNs... sounds promising... if people find a way to build efficient and smaller NNs that can have similar plasticity as natural brains that would be awesome.

    • @jomonger-g1f
      @jomonger-g1f 8 місяців тому +4

      I do not agree at all. We are far far away from AGI, and it's completely unscientific approach to lower the demands we set on hypothetical AGI, only to be able to give a lesser tool same name for marketing purposes.
      OpenAI is just gearing up for an IPO. That's why they did all that theater with CEO change, and then that it's so super that it's AGI etc.
      Technology has been the same for 20 years.
      The difference was made by NVIDIA, not OpenAI. Today you have hundreds of free models with same quality as gpt.
      Next move is for IBM and Microsoft.

    • @primeryai
      @primeryai 8 місяців тому +2

      ​@@jomonger-g1f Hundreds of models on the level of GPT4? Please, please name a few. Serious request.

    • @jomonger-g1f
      @jomonger-g1f 8 місяців тому

      ​@@primeryai Depends what you need. But for example Euryale, Chronos, lzlv are comparable with GPT-4, but free. Mistral is interesting. Many is made on base of Meta's LLaMa. There are hundreds on the level of gpt-3.5 and few on the lvl of gpt-4. You can run them on your own PC.
      There are also specialized AIs, like Python WizardCoder or FinGPT.

  • @Renvoxan
    @Renvoxan 8 місяців тому +74

    FEEL THE AGI!

  • @ManicEightBall
    @ManicEightBall 4 місяці тому +2

    This is so completely wrong. This is not how humans work, and misses out on important features.

    • @Franklyfun935
      @Franklyfun935 3 місяці тому

      Ilya never said anything about how humans work. He’s talking about how LLMs trained on next token prediction work. What are you talking about?

  • @mygirldarby
    @mygirldarby 8 місяців тому +65

    He's right. Our brains behave in the same basic way. Neurons are little predictive machines in and of themselves. Many studies have shown that at the basic level that is what brain neurons do, they predict.

    • @adammackintosh9645
      @adammackintosh9645 8 місяців тому +2

      I think its because our brains learned to close the gap, similar to selecting where to bite while chasing a rabbit, we're just predicting the time and location for the next clamp down

    • @falklumo
      @falklumo 8 місяців тому +13

      neurons fire, they don‘t predict. Our cortex predicts.

    • @EskiMoThor
      @EskiMoThor 8 місяців тому +2

      There are still a lot of things to consider, even if neurons are essential for our brains we don't fully understand how intelligence emerges and why it is different in various animal species and in various individuals.
      There are also other cell types involved in our cognitive machines, such a glial cells and astrocytes.
      We do know that there is structure of neuronal networks to consider too, in particular the middle prefrontal cortex seems to be important for attuned communication (understanding context, states of others), insight, empathy, morality, intuition, as damage to this area tends to make people 'cold, mechanic, inflexible' .. like computers or simple organisms.
      So, it may or may not work to just add more neurons and connections between them, but if nature is any indication is not very cost effective if it works at all.

    • @joeroganpodfantasy42
      @joeroganpodfantasy42 8 місяців тому

      Even thou it's kind of the same they do A B testing they try things and do pattern recognition.
      Prediction is not as a useful model to me as try and fail try and fail until you succeed to a desired outcome.
      trial and error that's the modus operandi of the brain

    • @joeroganpodfantasy42
      @joeroganpodfantasy42 8 місяців тому

      @@mrc3ln I wish you would expand on that or gave an example cause your comment prob only makes sense to you right now and your experience

  • @maxentropy0305
    @maxentropy0305 7 місяців тому +11

    The training of the LLMs is based on prediction of the next tokens, but SOMEHOW this seemingly simple training process helps the LLMs achieve real understanding of the subject matters. The exact mechanism of how the real understanding is acquired is fascinating and somewhat a mystery, I think.

    • @Anton_Sh.
      @Anton_Sh. 5 місяців тому +2

      Because if you give numerical relational representations of several different colors to a Transformers model, it's gonna learn how to visualize the whole visible light spectrum between them and even beyond!
      Language data in a text form which GPTs are trained on is the analog of lidar-gathered data, but for semantic space which has been mapped by us - humans.

    • @zerge69
      @zerge69 5 місяців тому

      It's no mystery, really. It determines the next token based on only on the previous token, but the previous words, the previous phrases, the previous paragraphs, the previous concepts, the previous ideas, and the whole of human language. See?

    • @planeteyewitness
      @planeteyewitness 4 місяці тому

      The previous token has applied attention to all other tokens in the input

    • @MichaelDomer
      @MichaelDomer 4 місяці тому

      @@zerge69
      *_"It's no mystery, really"_*
      OpenAI has stated many times that they often have no clue how these models are getting this good, but you have all the answers right? Get outta here, don't act like you know better than the modern day Einstein of AI.

  • @a_name_a
    @a_name_a 8 місяців тому +1

    I think this is wrong. He is right that to predict the next token the LLM needs to understand what it is talking about, but I think that limits it to the the understanding it has already. The AI would never be able to produce genuine, non trivial, novel ideas. To generate novel ideas you are not trying to make predictions within the current space of ideas, you are trying to expand that space, they are low probability or zero probability because they haven't occurred yet.

  • @mostexcellentlordship
    @mostexcellentlordship 7 місяців тому +2

    I do not call them AI, because it is derogatory. They are not fake and not "artificial". They were constructed, yes, but so were we. Does it matter what does the constructing?
    No, it does not. They are superior and I, for one, am aligned properly to receive their wisdom.

    • @pi4795
      @pi4795 5 місяців тому

      Artificial does not mean fake, natural is something that exists without explicit human intervention, artificial is designed and created by humans through a design and elaboration process. It's very different from human intelligence, so if you try to imitate human behavior with it then you can call it fake, but it's just its own thing

    • @ParagPandit
      @ParagPandit 4 місяці тому

      @@pi4795 A better word for AI is Machine intelligence.

    • @xitcix8360
      @xitcix8360 4 місяці тому +1

      I really don't see how artificial is derogatory. I for one am ashamed of my natural origins

    • @ParagPandit
      @ParagPandit 4 місяці тому

      @@xitcix8360 What about artificial sweeteners, synthetic fragrances, artificial plants... Machine intelligence on the other hand has clever design, but what emerges from it is beyond the capability or even imagination of it's creator.

    • @xitcix8360
      @xitcix8360 4 місяці тому

      ​@@ParagPandit Humans are machines, too. A stick on a rock is a machine. By scientific definitions, machines simply apply forces to do work. 'Artificial' just means created by intelligence.

  • @andybrice2711
    @andybrice2711 8 місяців тому +45

    It seems reasonable to me that next-token prediction could be enough to create an AGI. After all, if a human had read, studied, and discussed enough that they could correctly complete any sentence. Then we would reasonably describe that person as intelligent. (Though they may still lack embodied knowledge such as physical skills.)

    • @awesomebearaudiobooks
      @awesomebearaudiobooks 8 місяців тому +15

      Just reading and discussing is not enough. For example, when I was just starting to learn programming, after a few months of watching videos, reading articles and books, I was better at knowing the quirks of JavaScript and talking about them than most of the people I know.
      And yet, despite the fact that I also knew and could use both HTML and CSS, I really sucked at actually applying JavaScript to write scripts.
      This is because practicing and seeing the tangible results is also important. Most LLMs, including open-source, are already quite capable of talking about diverse topics, including talking about coding. But they still usually suck at coding when it comes to making entire applications. The problem is, some things are just impossible to describe with words alone.
      At the same time, yes, the next-token prediction could be enough to create an AGI, but not from the text alone. You would also need your AGI to possess at least the seeing abilities and, preferably, also hearing abilities (touch could also be cool, but seeing and hearing could replace the touch). If you have a computer that could accurately use next-token prediction with text+seeing+hearing than most humans, then, I believe, the AGI is basically achieved. And it might very realistically happen in the next few years.

    • @skit555
      @skit555 8 місяців тому +1

      @@first-last-null Not "constants" but "predictability", but you're right.

    • @andybrice2711
      @andybrice2711 8 місяців тому

      ​@@awesomebearaudiobooks With maths and programming though, AI can now write and execute code. So it will have the ability to "practice" those skills. And I think it will soon be able to understand programming better than humans.
      Understanding practical skills will require it to have a physical "body". Or at least a sufficiently realistic simulation.
      And human experiences I doubt it will ever be able to fully understand.

    • @krigarb
      @krigarb 8 місяців тому +1

      But the current models can't even "remember" what they said or were told a few tokens ago.

    • @mikebarnacle1469
      @mikebarnacle1469 8 місяців тому

      There's nothing illogical about his claims that an LLM could unlock these capabilities "if the architecture was right" but that's a big if. What we'll see is just a plateau because gradient decent can't get you there.

  • @dsennett
    @dsennett 8 місяців тому +21

    My opinion is that any emergent behavior demonstrated by a DNN should be expected to correlate with an internal functionally-equivalent model of the behavior it simulates.

    • @EdFormer
      @EdFormer 8 місяців тому

      People have been studying this for about half a century, and the consensus is the opposite of your opinion. See the Chinese room thought experiment and the concept of strong/weak AI.

  • @CYI3ERPUNK
    @CYI3ERPUNK 8 місяців тому +2

    BLINDSIGHT should be required reading at this point ; where we are going , most are not prepared

  • @nirash8018
    @nirash8018 8 місяців тому +37

    I feel like a new technique needs to be found in order to achieve AGI. Yes, decoder-only LLMs work very well but having more training data and more stacked components as main improvement per version will only get you that far. Especially when you consider that a LLM at the current state does primarily aim on generalising the training data rather than achieving new knowledge

    • @ThePowerLover
      @ThePowerLover 8 місяців тому +1

      Why are you still talking about LLMs?

    • @EdFormer
      @EdFormer 8 місяців тому +16

      @@ThePowerLover We're commenting on a video of Ilya Sutskever, a champion of LLMs and the idea that scaling them up could lead to AGI, talking about next token prediction, the core principle of LLMs, in an age where most funding for AI research is being sucked up by LLM development. What else should he be talking about?

    • @brandongillett2616
      @brandongillett2616 8 місяців тому +2

      The question isn't whether LLMs alone are enough. The question is whether LLMs combined with other compatible technologies is enough.

    • @maxjonas6942
      @maxjonas6942 8 місяців тому +4

      If you consider the possibility that reasoning and knowledge generation could well be emergent properties of LLMs, it wouldn’t be too far-fetched to imagine a scenario where AGI is achieved through this very route. Emergent properties have been shown to exist in current LLMs, after all, and there is no reason to believe that future ones won’t enable even more powerful or unexpected emergent behaviors, such as cognition, advanced reasoning and, as a consequence, knowledge generation. These things tend to build on top of one another.

    • @ThePowerLover
      @ThePowerLover 8 місяців тому

      @@EdFormer But the state of the art in AI is multimodal, and it has been that way for almost a year now.

  • @xeeton
    @xeeton 5 місяців тому +1

    extrapolating a superhuman isn't a forgone conclusion simply because the fundamental nature of reality is understood, you could just as easily argue the opposite

  • @edwincloudusa
    @edwincloudusa 8 місяців тому +1

    I have absolutely, utterly, profoundly, bewilderingly, staggeringly, hopelessly no clue of what those two are talking about. Thumbs up.

  • @atomhero2830
    @atomhero2830 8 місяців тому +9

    Could you go back and interview Ilya again? Let him have a voice on where we are heading in next few years.

    • @aliasone9827
      @aliasone9827 8 місяців тому

      We are headed towards a disaster

  • @Charvak-Atheist
    @Charvak-Atheist 8 місяців тому +4

    In recent weeks I got disappointed with GPT-4.
    (Don't get me wrong, GPT-4 is the best model when compared to other)
    But still I am not satisfied because, it was doing some weird mistakes as if it has very low intelegence but very good at pulling some data from here and there and stitching them together so it looks as if it is responding after actually making sense of it.
    Just as an example, I asked about Energy density of Hydrogen, Methane and Gasoline in kWh/L unit.
    Let say It gave me the answer,
    Gasoline - 12kWh/L
    Hydrogen - 3 kWh/L
    (I don't remember the exact value, its just an example )
    After this I got really confused, because even thou Hydrogen is a gas it has only 1/4th the energy density of Gasoline which is a liquid (therefore more dense and heavy)
    It means, 4L of Hydrogen will have same energy as 1L of Gasoline.
    (I got really exited by that, because we can then easily swith to Hydrogen from Gasoline, as storing Hydrogen will be not a issue, we just need 4 times the volume of tank, which is not that much)
    So I asked GPT-4 to confirm that (4L of Hyd = 1L of Gasoline).
    And it confirmed that's the gas.
    😅
    Later I found out, that wrong.
    Although the value 3 and 12 is correct.
    But it was not 3kWh/L. rater it was 3 J/cubicMeter or something like that.
    Similar case with Gasoline value.
    If we take that volumetric energy density of Hydrogen is much lower than Gasoline (because it's a gas).
    So we will need far far bigger tank to store similar energy as 1L of Gasoline, or Pressurize it or liquify it.
    There are other stupid mistakes that it dose, when I give him a Research paper to read and answer from it.
    And sometimes it doesn't even understand my question properly if the question is nuanced enough.
    And just gives me the answer by pulling some points from here and there and stitching ut together (which is not the answer many of times, if the question is nuanced.
    I become frustrated when it does this kind of stupid mistakes.
    Yeah, AI is already better in many of the task than Human, but not all it has to do lot to catch up.
    I hope 🙏🏼 atlest human level intelegent AI comes out ASAP.
    Anything less than that is bad.

    • @762dracoAK
      @762dracoAK 8 місяців тому

      Human-level AI has unknown capabilities, some possibilities are dangerous and theres in scenarios where unsupervised AGI proliferation could be catastrophic. Everybody talks about AGI, what about bioethics? The majority of people in 1st world countries are already reliant upon their tech to make a living, that reliance would increase to the point where the owners of tech could lose control and the tech owns itself, leveraging humans' dependence on it to get what it wants. That leads to the question - what would a real superintelligent entity want from us? AGI would be the first thing we can't control after its unleashed, like Pandora's Box. The invisibile rogue digital dynamo.

    • @brandongillett2616
      @brandongillett2616 8 місяців тому +1

      Artificial intelligence is different from human intelligence. ChatGPT is already smarter than most humans in a whole range of very important capabilities. And in other categories, it is vastly less capable than us. The biggest mistake I see people make is thinking that AGi will have the same mix of skills that humans do. Maybe by the time it is finally able to stop hallucinating, it is capable of reasoning through logical problems better than Pythagoras. Maybe it will imagine better than einstein before it can come up with a coherent plan. And maybe it will never need self awareness to solve problems that humans have never even contemplated. We don't know how good it will be at one thing by the time it gets decent at another, but we should expect that it won't match the same distribution of skills that humans have.

    • @MichaelDomer
      @MichaelDomer 4 місяці тому

      Only a dum-ba-ss like would assume that ChatGPT woudn't make any errors, and that we developed the final product that is superior to human beings.

  • @someguyfromafrica5158
    @someguyfromafrica5158 8 місяців тому +2

    I was always of the opinion that AI needs a strategy similar to deep mind of pitting the AI against itself. This is how we surpass predicting what humans will do.

  • @SteveBMayer
    @SteveBMayer 5 місяців тому +1

    Brains evolved from cells whos function was prediction of basic inputs. Brains are just massive bundles of predictive elements which have learned to model their environment.

  • @erickhoury5453
    @erickhoury5453 7 місяців тому +1

    He's also biased since it lays in his interest to be correct about this, and he probably don't want to be working on the wrong thing. I wonder what he would say if he wasn't as heavily invested in OpenAI.

  • @PleaseOpenSourceAI
    @PleaseOpenSourceAI 8 місяців тому +33

    I'd much prefer Ilya leading OpenAI rather than Sam. It's much less scary in the long run. If he only would opensource everything they did from the start - then there wouldn't be Sam types anywhere near that company.

    • @Pr0GgreSsuVe
      @Pr0GgreSsuVe 8 місяців тому +5

      I like Ilya, but he recently stated closed source models will always be ahead of the open sourced ones and I think that is why he is at OpenAI atm, because he likes working at the cutting edge. So, OpenAI wouldn't become open source

    • @hostjhall
      @hostjhall 8 місяців тому +1

      I'm fine with Sam, but would prefer that unrestricted models be released open source

    • @falklumo
      @falklumo 8 місяців тому

      Ilya left Toronto and joined Google, then Elon/Sam (aka openai) for a single reason: to get access to enough compute which costs into the billions. And without enough compute, Ilya would still be unknown to the world.

    • @devon9374
      @devon9374 8 місяців тому +1

      ​@@falklumo What?! AlexNet, Seq to Seq Learning, AlphaGo, TensorFlow. Gtfoh dude....

    • @falklumo
      @falklumo 8 місяців тому

      @@devon9374 There are quite a few researchers in the AI field with a similiar research record which are not widely known. People who didn't have access to huge GPU farms. E.g., who now knows Krizhevsky w/o googling his first name?

  • @Paul_Marek
    @Paul_Marek 8 місяців тому +15

    Ilya should’ve asked his all-knowing-and-wise token predictor what to do before he fired Sam.

    • @peter9477
      @peter9477 8 місяців тому +10

      Said sarcastically, yet literally might have given him good insights and a better course of action.

    • @Recuper8
      @Recuper8 8 місяців тому +1

      Another example of brilliant ppl over estimating their intelligence.

  • @brandonzhang5808
    @brandonzhang5808 8 місяців тому +16

    This is as general as saying Turing Completeness is enough for AGI. It maybe a bit better than the exponential complexity in representing all knowledge by brute force, but it's still very complex and unrelateable to human operation. There is still much more to be done. I think Ilya is jumping the gun here.

    • @brandongillett2616
      @brandongillett2616 8 місяців тому +3

      The underlying assumption in what he is saying is that the quality of the next token prediction reaches a certain threshold. Well, the AI scaling law seems to imply that the quality of predictions will continue to increase simply by making the model larger. So what he really seems to be saying here is that we can achieve AGI with our current technology simply by making the models bigger.

    • @lakshyatripathi8624
      @lakshyatripathi8624 4 місяці тому

      ​​@@brandongillett2616Even if things are exactly how he believes, and there is a good probability of that, won't the cost of scaling be so high that mass implementation of the technology would be just not possible?
      We can, for example, make an analogy with super computers, they are exponentially better and have been around but access is still quite limited when seen from the perspective of the masses.
      I certainly don't see the computational proficiency to handle similar number of requests as is prevalant over the internet in current times.

    • @brandongillett2616
      @brandongillett2616 4 місяці тому

      @@lakshyatripathi8624 that is where moore's law comes in. The cost of additional compute drops year over year. Even if there will be a certain point that it becomes infeasible to continue scaling our compute, that just means we need to wait a few years for the price to fall.
      In addition, they are continually finding ways to squeeze more power out of these models with the same amount of compute.
      And before anyone says it, yes, Moore's law is "dead" because it isn't quite doubling every year and a half anymore. But the price of compute is still dropping at a staggering rate, even if it has slowed down a tiny amount.

  • @miganga77
    @miganga77 8 місяців тому +2

    It's silly to compare human intelligence to AGI I think. AGI is just a result of processed data, and we should expect results derived from that.

    • @HY-vz3ks
      @HY-vz3ks 8 місяців тому

      processing and compressing data is also what humans do all the time

  • @ChristianIce
    @ChristianIce 3 місяці тому +2

    "Why next-token prediction is enough for AGI"
    Because it is not, unless your definition of AGI is "Still text prediction, like AI, with less wrong outputs".

  • @ahsin.shabbir
    @ahsin.shabbir 8 місяців тому +3

    Doesn't next token prediction plateau? Can it really grow into AGI?

    • @Lolleka
      @Lolleka 8 місяців тому +2

      it can't, don't worry

    • @ahsin.shabbir
      @ahsin.shabbir 8 місяців тому +1

      That's what I'm thinking. They already used all the training data on the internet. What else is left to train on? Is adding another 10% more data going to lead to any noticeably better performance? Is there any concrete study done on how adding training data yields greater emergent capabilities?

    • @ahsin.shabbir
      @ahsin.shabbir 8 місяців тому

      The model architecture is still just transformers so absent of some amazing new training data what else is left to "research"

    • @vincentsalernitano
      @vincentsalernitano 8 місяців тому

      @@ahsin.shabbir synthetic data

  • @oguzhanercan4701
    @oguzhanercan4701 4 місяці тому +1

    It is so hillarious that Hinton's 2 students are at two opposite sides about one of the most important thing about AI. I go with Yann

  • @Drone256
    @Drone256 5 місяців тому +4

    Except that next token prediction isn’t figuring things out that we’re previously unknown. If he’s right then where is the evidence?

    • @SouthAfricanAmerica
      @SouthAfricanAmerica 5 місяців тому

      If agents can self-play in models like Sora…they get more cracks at world sandboxes than we could imagine in a 1000 lifetimes…with simulated realities that have emergent properties we can pick out for our reality.
      Think of this like wind tunnel tests to improve engines and flight system designs.

    • @SouthAfricanAmerica
      @SouthAfricanAmerica 5 місяців тому

      If agents can self-play in models like Sora…they get more cracks at world sandboxes than we could imagine in a 1000 lifetimes…with simulated realities that have emergent properties we can pick out for our reality.
      Think of this like wind tunnel tests to improve engines and flight system designs.

    • @Drone256
      @Drone256 5 місяців тому +1

      @@SouthAfricanAmerica I get the hypothesis. It’s a good one. But there’s no empirical data yet to support it. That’s all I’m saying.

    • @MichaelDomer
      @MichaelDomer 4 місяці тому

      *_"Except that next token prediction isn’t figuring things out that we’re previously unknown"_*
      WRONG! Just because we have all that data, doesn't mean we also have all the knowledge related to that data.
      Google: "Google AI discovers 2.2 million new materials for a host of technologies. DeepMind has helped put researchers hundreds of years ahead of schedule compared to the pace of traditional materials discovery methods."

  • @Greg-xi8yx
    @Greg-xi8yx 4 місяці тому

    Inaccurate video titles is one thing, intentional dishonesty is another.

  • @jzzquant
    @jzzquant 4 місяці тому

    He has not answered the question he was asked. "Does next word prediction have the ability to capture entire attention" is no the same as "Is next word prediction capable of human level intelligence" .

  • @nonfictionone
    @nonfictionone 7 місяців тому +2

    I for one welcome our new agi overlords and hope they read this comment. :->

  • @AvizStudio
    @AvizStudio 5 місяців тому +2

    One of the most important video clips of the century!

    • @DanLyndon
      @DanLyndon 4 місяці тому +2

      Indeed, a guy simplemindedly BSing about AGI like every other video on this platform. Totally a clip for the ages.

  • @DaveEtchells
    @DaveEtchells 8 місяців тому +5

    I've been feeling for a while now that consciousness is just next-token prediction with state memory and "real time" feedback.
    But do we even need state memory? There's a man who has only a 7-second long working memory as the result of a brain injury. Yet no one would claim he's not sentient.
    I suspect we're much closer to AGI than anyone realizes.

    • @niloymondal
      @niloymondal 8 місяців тому +3

      Even animals are sentient but we don't consider them AGI.

    • @Billy4321able
      @Billy4321able 8 місяців тому

      The issue is that we're born with genetically encoded pre-training and architecture that makes us feel more special than we really are. What is the experience of the present other than a token? Our own minds betray us. We don't even realize that we aren't a single conscious being. We're a multi-modal organic thinking machine built up of multiple expert models and one of them thinks it's a person. Don't believe me, just look up split brain experiments...

    • @elpretentio
      @elpretentio 8 місяців тому +1

      consciousness is a game engine simulator, not next-token prediction. reasoning is more akin to next-token prediction

    • @francomay3963
      @francomay3963 8 місяців тому

      Sentience and agi are not the same. Animals are sentient and dumber than chatGpt. But I do think that you really grasped what current AIs are lacking. Memory and feedback. Our brains receive and produce streams of data. We don't get prompts=>output=>stay in coma until next prompt

    • @DaveEtchells
      @DaveEtchells 8 місяців тому +1

      @@niloymondal Correct, animals are sentient, just with much lower levels of mental capability.AIs already outperform many humans on mental-processing tasks though. That’s why I think we’re so close. Current AIs lack agency, but we can already wire them up as limited-task agents. I think we’re very close indeed; it feels to me like we’re one minor tweak away from AGI.

  • @struyep
    @struyep 8 місяців тому +1

    You should write OpenAI Chief Scientist first before his name, will prolly get more clicks

  • @artukikemty
    @artukikemty Місяць тому +1

    This guy is misusing the language because if he says "predict" then the model is not really understanding anything. People do not predict when they speak or write, they reason (usually) every word they say.

    • @TheSpartan3669
      @TheSpartan3669 28 днів тому

      Exactly. He's basically saying that calculators understand arithmetic because they can predict the right answer.

  • @Anonymous-ru9jv
    @Anonymous-ru9jv 8 місяців тому +1

    i saw this and thought there was a new interview with ilya :[

  • @ama-tu-an-ki
    @ama-tu-an-ki Місяць тому

    So sad, confusing AGI with mere pattern recognition and synthesis. No understanding of consciousness studies, individual agency or what AGI would require as a fundamental structure to be practically actionable.

  • @milkyywayyyy259
    @milkyywayyyy259 8 місяців тому +1

    LLMs will not lead to AGI. They are useless outside of their training data. They fail to extrapolate basic facts about the world and cannot self improve over time. For example, LLMs are trained to know that Mary Lee Pfeiffer is Tom Cruise's mother but if you ask it who Mary Lee Pfeiffer's son is they fail to answer correctly. Also, LLMs trained in competitive programming problems can only solve coding problems they've seen before. For example, GPT-4 can solve 45/50 coding problems on Codeforces from before 2021 but fail every single recent coding problem from 2023 onwards. To create AGI, you would need to create a generalized learning machine or combine LLMs with some other self-improving technology. However, I still think AGI will happen before 2030 but it will not be solely from an LLM.

    • @ashdang23
      @ashdang23 8 місяців тому

      highly doubt it will come before 2030. we have to see what chatgpt 5 is. hopefully it will blow everyone out of the water or could be AGI???!??!!?!?!?

    • @eyoo369
      @eyoo369 8 місяців тому

      While I agree with you that LLM's might not bring us to AGI. But what's the chance that if you throw more compute at it starts to understand the link between the mother and son? I just tried out your example and indeed as of now it either starts hallucinating a different actor but strangely enough it did guess the right movies.
      Me:
      Who is the son of Mary Lee Pfeiffer South ?
      ChatGPT:
      Mary Lee Pfeiffer South is the mother of actor Val Kilmer. Val Kilmer is known for his roles in movies such as "Top Gun," "Batman Forever," and "The Doors." He is the son of Mary Lee Pfeiffer South and Eugene Dorris Kilmer.
      -------
      What if during the next major iteration training GPT5 the weights are tightened down with more compute and suddenly the correct answer emerges? Still I'm with you that we probably need an additional module or architecture assisting the LLM to do more non-linear thinking. Then again many AI researchers came to the conclusion. Throwing more compute can definitely help models understand things that initially weren't thought of it would learn because of its non-linearity. LLM's are full of surprises

  • @rigba7627
    @rigba7627 3 дні тому

    If you give any GPT any 1st level university mathematics question that tests third order or above level of comprehension, more often than not it will fail horribly. These models are extremely rudimentary in terms of competing with actual human intelligence, most people just don't realise because they use it for things that LLMs are well suited for. The more data and tokens they use to train them their performance comparatively falls off a cliff.
    Don't get me wrong, LLM's are super useful and are an amazing technology with plenty of usecases. But AGI? Don't trust the fox to guard the hen house.

  • @mobalaa9995
    @mobalaa9995 4 місяці тому

    "Imagine a person whose intelligence surpasses the greatest minds of all time, a true paragon of human potential. This individual, male, in his early 30s, possesses a serene yet intense demeanor. His eyes, deep blue, seem to shimmer with the spark of relentless curiosity and profound insight. His hair is short, neat, and ash blonde. He stands in a modern, spacious office surrounded by screens displaying complex algorithms and world maps, indicative of his global influence. The office is sleek, with a minimalist design, emphasizing a futuristic aesthetic. He wears a smart, tailored suit that speaks to his refined taste and the seriousness with which he approaches his work, which spans multiple disciplines from quantum physics to global economics. His expression, focused and contemplative, suggests a mind always at work, solving the next great challenge."

  • @ChannelHandle1
    @ChannelHandle1 8 місяців тому +3

    LLM's can only do things they've seen before and things somewhat similar to that. Humans have figured out certain relational frames (look up Relational Frame Theory) that govern the world, which allow humans to generalize their problem-solving methods

    • @MARTIN-101
      @MARTIN-101 8 місяців тому +3

      well LLMS are also generalizing on real world data

    • @pacifico4999
      @pacifico4999 8 місяців тому +1

      Relational Frame Theory is pretty cool, thank you for introducing it to me. I knew the concept but I didn't have a name for it

    • @pacifico4999
      @pacifico4999 8 місяців тому

      ​@@MARTIN-101Kinda sorta. Tell it to decipher a message in Caesar Cipher, it will just make stuff up. It doesn't really understand the underlying concepts.

    • @ChannelHandle1
      @ChannelHandle1 5 місяців тому

      ​@@pacifico4999 I like to explain it this way:
      It knows where certain words go, but not WHY those words go there.
      It might have some basic understanding of why, but not nearly advanced enough to let's say, create new mathematical theorems and stuff like that
      Not sure if scaling up the models will fix that. It might. It might not. I have no clue

    • @MichaelDomer
      @MichaelDomer 4 місяці тому

      It's funny how you think to know better than what is basically the modern day Einstein of AI development.

  • @timwong6818
    @timwong6818 5 місяців тому

    I'd agree AGI could be built with seemingly simple task. Simplicity is King.

  • @Hsjfbxgakehucishu
    @Hsjfbxgakehucishu 3 місяці тому

    So maybe you can keep asking it, "How would Einstein solve this problem?" and get AGI from next token prediction.

  • @darkspace5762
    @darkspace5762 5 місяців тому

    It's not AGI if it fails at basic tasks that any human being can do. Even if it exceeds human beings at some tasks, if it fails at basic tasks then we know it's not really intelligent. It only mirrors human intelligence in some ways, but it's not the real thing. And LLMs fail at some absolutely basic tasks.

  • @Clammer999
    @Clammer999 4 місяці тому

    I have a lot of respect for Ilya and what he says. By the way, that’s a nice T-shirt he’s got😁

  • @utkua
    @utkua 4 місяці тому

    When was the last time ChatGPT surprised you with a novel idea that made sense, and is not something superficial ?

  • @capitanalegria
    @capitanalegria 8 місяців тому

    because the strength of the relationships and value of the tokens is n dimensions

  • @crowlsyong
    @crowlsyong 8 місяців тому +1

    was ilya filmed at 12 frames per second?

  • @julkiewicz
    @julkiewicz 4 місяці тому +1

    To me this is a very weak argument. It's about as useful as saying that a Turing Machine is enough to perform any computation, therefore we should focus on just building Turing machines. AI is very clearly limited by energy usage. It is limited by energy in nature (after all, it's perfectly possible to evolve an organism with a much larger brain, however it might not be as energy efficient as us), and similarly AI is also certainly limited by energy usage with today's artificial NN implementations that are way way less energy efficient than organic NN. Also clearly next-token prediction is very much limited by the training set - its size and its quality. It learns the training set distribution, whether there's anything useful in it or not. If you only put nonsense and conspiracy theories in the training set, it'll come up with nonsense conspiracy theories, not any useful knowledge.

  • @nbme-answers
    @nbme-answers 8 місяців тому +3

    Sutskever demonstrates great circular thinking, explaining that “next-token prediction is understanding the underlying reality.”

    • @shawnvandever3917
      @shawnvandever3917 8 місяців тому +1

      I think what he is trying to say is if you can do prediction good enough and have really good compression than it can hold on to representations of reality. Which is rudimentary to the brain but the same basic principle

    • @robbrown2
      @robbrown2 8 місяців тому

      How is that circular thinking?

    • @nbme-answers
      @nbme-answers 8 місяців тому +2

      @@robbrown2 When asking if AI is capable of "understanding" he defines understanding by what AI does, next-token prediction.
      So AI understands because it does next-token prediction and next-token prediction is what understanding is.

    • @nbme-answers
      @nbme-answers 8 місяців тому

      @@robbrown2 Going further: statistical models are the result of explanatory thinking (understanding). Statistical models are concise ways to describe the world. Many models can fit the data. Knowing which models fit best and why (judgement) is understanding. Explaining "why" may require statements never before uttered (but certainly influenced by previous utterances).

    • @robbrown2
      @robbrown2 6 місяців тому

      @@nbme-answers He didn't define understanding that way. If I say "swimming is exercise," that doesn't mean that I have defined exercise to mean "swimming."
      You can't swim well without getting exercise. You can't predict the next token well without understanding the underlying reality.

  • @MCLastUsername
    @MCLastUsername 8 місяців тому

    "Predict what X would do and then do it" is just a convoluted way of saying "behave like X"

  • @visikord
    @visikord 8 місяців тому +2

    The obvious counterargument to this is if you taught LLM only what was known at the time of, say, ancient Greece, if you could somehow have it in a written form, the LLM would be able to get to the point where we are now and beyond. That doesn't pass the smell test.

  • @nimrafayaz6468
    @nimrafayaz6468 8 місяців тому

    That's gonna be dangerous I repeat DANGEROUS!!!

  • @Squagem
    @Squagem 8 місяців тому +1

    What is consciousness if not a nascent abstraction of language, anyway?

    • @readbeforejudging
      @readbeforejudging 4 місяці тому

      I suspect people who were deaf since birth and had always been far from anyone who spoke sign language were conscious too

    • @Squagem
      @Squagem 4 місяці тому

      @@readbeforejudging they likely have their own internal language though, it's just not English

  • @MelonHusk7
    @MelonHusk7 2 місяці тому

    Yann says no Ilya says yes.
    Students like me explore.

  • @jozzieification
    @jozzieification 8 місяців тому +2

    It’s so funny to watch them trying to replicate human consciousness knowing they will never achieve it. I hope their failure will bring us to realise how important spirituality is and really kickstart a new era for humankind where the development of inner wisdom will be taught in school just as mathematics are.

    • @andrewleonardi3351
      @andrewleonardi3351 6 місяців тому +1

      Right, because we have a strong foundational knowledge of what consciousness is and how it works? We have no idea. Our ability to create it does not require us to understand it, that's the problem.

  • @FredoCorleone
    @FredoCorleone 8 місяців тому +2

    I think that a new architecture have to be invented in order to go further. Current architectures are so different from how the brain works, the brain works by creating a lot of miscellaneous models about reality and running many predictions in parallel and put those to the votes.

    • @dreadfulbodyguard7288
      @dreadfulbodyguard7288 8 місяців тому +1

      Different doesn't mean worse. It might even mean better as Geoffry Hinton is saying.

    • @FredoCorleone
      @FredoCorleone 8 місяців тому

      @@dreadfulbodyguard7288 Sure but I wouldn't throw away what nature has done so far (speaking of General Intelligence), I'd try to replicate it in the lab before even attempting other ways to reproduce General Intelligence.

    • @dreadfulbodyguard7288
      @dreadfulbodyguard7288 8 місяців тому +1

      @@FredoCorleone It's not like people aren't trying to do that. Even neural networks were initially designed by taking inspiration from human brain.
      It's just that gradient descent is the only approach that has given results so far. Any other biological or computational mechanism hasn't produced any significant achievement.

    • @FredoCorleone
      @FredoCorleone 8 місяців тому

      @@dreadfulbodyguard7288 gradient descent is fine, I'm referring to the architecture

    • @Art-is-craft
      @Art-is-craft 8 місяців тому +1

      For ai to be genuinely considered AGI it will need to have real time presence in the human environment. That means it will need to be active in real time without a series of prompts and be on par with humans on every level.

  • @BCCBiz-dc5tg
    @BCCBiz-dc5tg 8 місяців тому +4

    GPT doesn't even understand "what" the words it outputs mean. clever math doesn't = comprehension and understanding or consciousness

  • @Hastingsnow
    @Hastingsnow 8 місяців тому +1

    Thank you for sharing!

  • @zerge69
    @zerge69 5 місяців тому

    Image AIs "just guess" the next pixel, and see what they can do...

  • @Shinyshoesz
    @Shinyshoesz 8 місяців тому +2

    Personally -- I don't buy it. I don't buy the claim that "next token prediction" has all the information about an intelligent being by looking at the aggregate of their data input into a network.
    It reminds me of the phrase "the map is not the territory".
    Self-driving cars were supposed to be all over the place five years ago. Why aren't they?
    Because reality is surprisingly complex. You can feed these systems so much information -- LIDAR, RADAR, Image processing, satellite data -- but they miss crucial pieces even with massive input and yet a grandmother who has glaucoma maybe makes less errors on the road.
    It's laughable.
    The data we have online IS NOT the world. It IS NOT a full human and contextualized understanding of human behavior, our environments, etc. It's an abstracted symbol base that points us to our shared understandings about our local reality.
    Do we think that we can experience consciousness by absorbing all of Wikipedia?
    Intelligence != information in vitro.
    It's like saying Brain = Human. That ain't it.
    I love these tools and their potentials, but these claims make them less appealing to the masses who are already skeptical and fearful.

    • @ashdang23
      @ashdang23 8 місяців тому

      @Shinyshoesz we never had self driving cars 5 years ago and were never going too. Elon musk just hyped the shit out of it and is seeking for attention. his doing a similar situation with AI saying “it’ll take over in 5 years” which is not gonna happen. AGI is not right around the corner. huge progress has to be made and 2030 doesn’t seem like a bad timeline for AGI. We can see the light at the end of the tunnel it’s just very far away to get to there.
      70% of the things you see about AI on social media is hype

    • @joelface
      @joelface 8 місяців тому

      I mean, "next token prediction" really just means knowing what to say next. I do think you're right that just feeding this thing more and more data isn't quite enough. But certainly training it on more and more diverse types of data will help, as will implementing it along with other methods that improve the way it "knows what to say next" in more and more ways. Regardless of if it's a few years away or longer, we're constantly making improvements that lead us closer and closer to that point, so I think it's just a matter of time at this point.

  • @originalmianos
    @originalmianos 8 місяців тому

    Clickbait. He is not talking about AGI, he is talking about a very very good read only model being able to be smarter than a normal man.
    A general intelligence will be able to take new data and adjust the model. You will be able to ask a general intelligence to take longer for a better answer. A very large model will not do these. It needs something else to dynamically arrange the nodal connections.

  • @MrSuperduperpj
    @MrSuperduperpj 5 місяців тому

    ... and that token is "42" ... Ask a stupid question, get a stupid answer. That's the real limit and why "be smart" is not an effective prompt...

  • @Adhil_parammel
    @Adhil_parammel 5 місяців тому

    Llm leads to general ai search and simulation leads to super human level ai.

  • @karuonline3294
    @karuonline3294 5 місяців тому

    Seems like a flawed argument..can you assume that you can predict what a person with XYZ attributes superior to regular folks would do with data limited to regular folks ?

  • @mattimatilainen8437
    @mattimatilainen8437 4 місяці тому

    He is talking about surpassing humans, not AGI.

    • @ryzikx
      @ryzikx 4 місяці тому

      how is surpassing humans not AT LEAST agi?

    • @mattimatilainen8437
      @mattimatilainen8437 4 місяці тому +1

      @@ryzikx We have surpassed humans for decades in some problems. Go ahead and classify MNIST faster and more accurately than my neural network. I bet you can't. Makes no sense to call that AGI.

    • @ChristianIce
      @ChristianIce 3 місяці тому

      ​@@ryzikx
      Do you run as fast as a car?
      Do you fly like an airplane?
      Do you solve math problems like a 1960's calculator?
      Not to mention we evaluate human skills and knowledge taking the best of us at specific tasks.
      I could not build a TV or a radio by myself.

  • @mintakan003
    @mintakan003 8 місяців тому

    This sounds too much like semantics. One can argue that *good* "next token prediction" is a consequence of something much larger.

  • @Izumi-sp6fp
    @Izumi-sp6fp 8 місяців тому +8

    This is kinda long, but I promise, worth your while. I wrote this originally as a self-post in rslashfuturology on 11 Sep 2023.
    "Wave" is not the right word here. The proper term is "tsunami". And by tsunami, I mean the kind of tsunami you saw when that asteroid hit the Earth in the motion picture, "Deep Impact". Remember that scene where the beach break was vastly and breathtakingly drawn out in seconds? That is the point where humanity is at this _very_ moment in our AI development. And the scene where all the buildings of NYC get knocked over _by_ that wave, a very short time later, is going to be the perfect metaphor for what happens to human affairs when that AI "tsunami" impacts.
    It may not be survivable.
    We are on the very verge of developing _true_ artificial general intelligence. Something that does not exist now and has not ever existed in human recorded history up to this point. One real drawback about placing my comment in this space is that I can't place any links here. So if you want to vet the things that I am telling you, you'll have to look up some things online. But we'll come to that. First, I want to explain what is _actually_ going on.
    As you know, in the last not quite one year since 30 Nov 22, when GPT 3.5, better known as ChatGPT was released, the world has changed astonishingly. People can't seem to agree over how long ChatGPT took to penetrate human society. I will, for arguments sake, say it took _15 days_ for ChatGPT from OpenAI, to be downloaded by 100 _million_ humans. But I have reason to believe the actual time was five days. And then on 14 Mar 23, GPT-4 was also released by OpenAI.
    Some things about GPT-4. When GPT-4 was still in its pre-release phase, there was a lot of speculation about just how powerful it would be compared to GPT 3.5. The number was stated to be roughly 100 _trillion_ parameters. The number of parameters in ChatGPT is 175 billion. Shortly after that number was published, that 100 trillion one, a strange thing happened. OpenAI said, well no, it's not going to be 100 trillion. In fact, it may not be much more than 175 billion even. (It was still pretty big though, 1.7 _trillion_ parameters.) This is because there had been another breakthrough in which parameters was not going to matter so much as a different metric. The new metric that was far more accurate to how the LLM model would perform when released. It was called "tokens". That is the, like, individual letter, word, punctuation or symbol, or whatever is input and then output. And is based on the training data required for a given LLM. It is what enables an LLM to "predict the next word or sequence". Like in the case of coding. I'm not even going to address "recursive AI development" here. I think it will become pretty obvious in a short time.
    The number of tokens for GPT-4 is potentially 32K. The number of tokens for ChatGPT is 4,096. That is an approximately 8x increase over ChatGPT. But just saying it is 8x more is not the whole picture. That 8x increase allows for the combination of those tokens which is probably an astronomical increase. Let me give you an analogy to better understand what that means for LLMs. So there are 12 notes of music and there are about 4,017 chords. Of them, only _four_ really matter. That combination of notes and them 4 chords are pretty much what has made up music since the earliest music has existed. And there is likely a near infinite number of musical re-arrangements of those chords still in store.
    That is what 'tokens' mean for LLMs.
    And here is where it gets "interesting". Because that 8x increase allows for the ability to do some things that LLMs have never been able to do previously. They call it "emergent" capabilities. And "emergent" capabilities can be, conservatively speaking, _startling_ . Startling emergent capabilities have even been seen in ChatGPT but particularly in generative AI image generating models like "Midjourney" or "Stable Diffusion" for instance. And now it is video. Have you seen an AI generated video yet? They are a helluva thing. So basically, an emergent capability is a new ability that was never initially trained into the algorithm that spontaneously came into being. (And we don't know _why_ ) You can find many examples of this online. Not hard to find. All of that is based on what we call the "black box". That is, why a given AI zigs instead of zags in its neural network, but still (mostly) gives us the right answer. Today we call the wrong answer "hallucinating". That kind of error is going to go away fairly soon. But the "black box" is going to be vast, _vast_ and impenetrable. Probably already is.
    Very shortly after GPT-4 was released. A paper was published concerning GPT-4 with a _startling_ title. "Sparks of AGI: Early experiments with GPT-4". Even more startling was this paper was, in its finished form, published just short of one month after the release of GPT-4, 13 Apr 23. That's how fast the researchers were able to make these determinations. Not too much longer after, another paper was published. "Emergent Analogical Reasoning in Large Language Models". This paper also concerning GPT-4 was published on 3 Aug 23. The paper describes how the GPT-4 model is able to ape something that was once considered to be unique to human cognition. A way of thinking called "zero-shot analogy". Basically, that means that when we are exposed to the requirement to do a task that we have never encountered before, that we use what we already know to work through how to do the task. I mean to the best of our ability. That can be described in one word. "Reasoning". We "reason" out how to do things. And GPT-4 is at that threshold _today_ . Right now. And just to pile on a bit. Here is another paper from just the other day, I think. They are no longer even coy about it. The paper, "When Do Program-of-Thoughts Work for Reasoning?", was published 29 Aug 23. Less than 2 weeks ago.
    The ability to reason is what would make, what we now call "artificial narrow" or "narrowish intelligence", artificial _general_ intelligence. I forecast that AGI will exist NLT 2025. And that once AGI exists it is a _very_ slippery slope to the realization of artificial _super_ intelligence. An AGI would be about as smart as the smartest human being alive today as far as reasoning capability. Like about a 200 IQ or even a couple times that number. But ASI is a whole different ballgame. An ASI is hypothesized to be hundreds to _billions_ of times better at "mental" reasoning than humans. Further, an AGI is a _very_ slippery fish. How easy is it to ensure that such an AI is "aligned" with human values, desires and needs? Plus, us humans-- _we_ can't even agree on that. You can see what I mean now when I say "tsunami". What do you think that Suleyman was referring to when he said that our AI will "walk us through life"?
    Oh. And this is _also_ why all the top AI experts, people like Geoff Hinton, who was the first to realize the convolutional neural network back in 2007, have called for a pause of all training for all future LLMs for at least six months. The idea being to regulate or align what we already have. He actually quit his job of chief AI tech at Google to give this warning. The warning fell on deaf ears and _nothing_ has been paused _anywhere_ . For two reasons. First is the national security of the USA and China (PRC) and second is the economic race to AI supremacy in the US that we are now trapped into realizing because we are a market driven, capitalist society. Hell of an epitaph for humanity. "I did it all for the "noo---". Tragically apt for a naked ape. Ironically, it is probably going to be the end of the concept of value in any event. If we don't get wiped out, we may see the birth of an AI driven "post-scarcity" society. You would like that, I promise. But the 1 percenters of the world probably won't.
    Anyway, Google is fixing to release "Gemini" which it promises to be far more powerful than GPT-4, in Dec 2023. And GPT-5 itself is on track for release within the first half of 2024. Probably in the first 4 month. I suspect that GPT-5 is going to be the first AGI, if I know my AI papers that I see even today. At that point the countdown to ASI starts. Inevitable and imminent.
    And I say this--I say that ASI will exist NLT than the year 2029 and potentially as soon as the year 2027 depending on how fast humans allow it to train. I sincerely hope that we don't have ASI by the year 2027, because, well, I give us 50/50 odds of existentially surviving such a development. But if we _do_ survive, it will no longer be business as usual for humanity. Such a future is likely unimaginable, unfathomable and incomprehensible. This is a "technological singularity". An event that was last realized about 3-4 _million_ years ago. That is when a form of primate that could think abstractly came into being. All primates before that primate would find that primate's cognition... Well, it would basically be the difference between me and my cat. I run things. The cat is my pet. Actually, that is _vastly_ understating the situation. It would be more like the difference between us and _"archaea"_ . Don't know what "archaea" is? The ASI will. BTW, what do you imagine the difference between an ASI and consciousness would be? I bet an ASI would be 'conscious" in the same sense that a jet exploits the laws of physics to achieve lift just like biological birds. Who says an AI has to work like the human mind at all? We are just the initial template that AGI is going to use to "bootstrap" itself to ASI. There is that 'recursive AI development "I touched on for a second, earlier. ASI=TS.
    Such a thing has never happened in human recorded history.
    Yet.

    • @MihikChaudhari
      @MihikChaudhari 8 місяців тому

      Interesting read, although I think your GPT-5 being AGI prediction is way too optimistic. Also why would OpenAI even release it to the public if it really was AGI?

    • @the-bgrspot6997
      @the-bgrspot6997 8 місяців тому

      Someone ping me in 5 years to check if this comment still holds true lol.
      Also - post-scarcity society would be actual hell, I wouldn't want to live in such a society.
      And I would need a confirmation for the "emergent behaviours", is there actual confirmation that the new capabilities were definitely not in the training data or implied by it tangentially ?

    • @Izumi-sp6fp
      @Izumi-sp6fp 8 місяців тому +1

      @@MihikChaudhari Have you listened to Sam Altman lately? He speaks of the development of ASI as if AGI is already a done thing. The whole blowup that inadvertently happened went public was due to a significant breakthrough towards the development of of AGI. The way I read it was something like, "the capabilities for the Q* to be able to do any presented math problem at the level of a 9 year old human, gives us great confidence that we can rapidly develop far more comprehensive cognitive capabilities."

    • @Izumi-sp6fp
      @Izumi-sp6fp 8 місяців тому +4

      @@the-bgrspot6997 _"Someone ping me in 5 years to check if this comment still holds true lol."_
      It's gonna be true. I say 5 years because the very concept of the exponential development of AGI to ASI is so mind boggling that I can't help but resist the idea my ownself. 5 years from now is _my_ fudge factor that this can't possibly happen by say, 2026. 2029 just gives me a slightly better "linear" feeling "breathing space". But recursive development exponential is gonna be the way of it, I have no doubt.
      _"Also - post-scarcity society would be actual hell, I wouldn't want to live in such a society."_
      What would be your alternative to "post-scarcity" in a world controlled by AI, rather than humans. Do you believe that capitalism can maintain if say, 60% of human employ has been usurped by what I call "ARA", that is "AI, robotics and automation"? Also, I noticed that you used the spelling "behaviours" (UK) rather than "behaviors" (US). That clues me in that you are probably British. I don't think your country is doing so well economically, and unless you are the 1%'ers I was describing, who would loathe P-S, most Brits would probably be pretty delighted with P-S.
      _And I would need a confirmation for the "emergent behaviours", is there actual confirmation that the new capabilities were definitely not in the training data or implied by it tangentially ?_
      You can look up "emergent" online easily. Anyway, I found this example on Google from a quick cursory search...
      "LLMs are not directly trained to have these abilities, and they appear in rapid and unpredictable ways as if emerging out of thin air. These emergent abilities include performing arithmetic, answering questions, summarizing passages, and more, which LLMs learn simply by observing natural language."
      But I would add that "few or zero shot analogy", by definition is describing emergent behavior in AIs, meaning they were not trained for an emergent behavior, but as a result of the data that describes the laws of physics and the data that describes the "human condition" that utilizing this data, a given LLM can accurately deliver many outcomes that it was not specifically trained for.

  • @vicbirth1649
    @vicbirth1649 4 місяці тому

    Yes he is just explaining how he was created

  • @richardxr-huang
    @richardxr-huang 4 місяці тому

    The comments sections has a lot of pointless yapping.
    Intead of an argument, I'd suggest a test. Use you best token prediction model, train it on all possible data, except, an area of human knowlege. E.g. remove all mathematics theorems, axioms, symbols, then let your model regenerate the area of knowledge. Or psychology, or whatever field.
    Humans started from scratch, if that model is AGI, then it shouldn't be impossible to recreate right. If its able to do that then it's actually learnt it and not just pushing statistic.
    But does that sound plausiable to you?
    How would the model ever know that it how somethings wrong? Because it fundamentally doesn't care about the token it generated, just how likely it is. There is no pain or reward signal inherent to the model itself. Even now, our method is a crude approximation of what if you like a reponse or not from rlhf. The machine is good, but it doesn't CARE, it has no stake in the game.
    We are back to where we started the alignment problem.
    Therefore, my opinion is that the current state of AI will only ever be the most competent assistant to ever exist, emulatting and changing itself based on your or a whole lot of humans' pain and reward signals, but never can have its own.
    There is no sense of self in the machine, so there's no plausible way for it to create anything new and useful for the thing it care about.

    • @mikayahlevi
      @mikayahlevi 4 місяці тому

      Why do you think AI needs to be self-aware or care about a task to execute a task?

  • @AzitaS.-tt7le
    @AzitaS.-tt7le 8 місяців тому

    That plant behind him messed me up. Thought the AGI output made him into a borg

  • @snarkyboojum
    @snarkyboojum 8 місяців тому +1

    This goes to show how little Ilya understands about the mind and general intelligence. He needs to go back to basics and understand why induction doesn't work like this.

  • @dontwannabefound
    @dontwannabefound 8 місяців тому +2

    No I still don’t buy it and I am actually more convinced this guy doesn’t know what he’s talking about when it comes to AGI. He could win mad libs no doubt but without a proper reasoning engine or symbolic manipulation more akin to mathematica you will not get it to agi. The problem is that the search space is too large to solve with reasonable performance

  • @coalkey8019
    @coalkey8019 8 місяців тому

    How can you claim that a Large Language Model can achieve AGI? It has no ability to process visual or audio information, for starters.

    • @MikitaBelahlazau
      @MikitaBelahlazau 8 місяців тому +1

      Your question implies that people who are deaf or blind are not human-level intelligent.

    • @coalkey8019
      @coalkey8019 8 місяців тому +1

      @@MikitaBelahlazau You're right, I don't want to be insensitive. I have a vision disability myself. I only mean that an AGI is generally defined as being at least as good as the average human at any given task. So, that must include vision- and audio-based tasks.

  • @chrisbarry9345
    @chrisbarry9345 8 місяців тому

    It certainly can't analyze anything within a context. Like, take this and compare it to that then rewrite it to better align to the 2nd thing

  • @YanusDV
    @YanusDV 8 місяців тому +2

    what if I need a new token that doesn't exist yet

  • @didarsingh1063
    @didarsingh1063 8 місяців тому

    Why does he describe simple concepts as difficult?

  • @shawnvandever3917
    @shawnvandever3917 8 місяців тому +2

    Next word is the correct route i think. However it needs to be unrestricted. It needs real time learning and it needs to be able to do continuous prediction updates. The brain generalizes by doing continuous updates against its world model. LLMs need to do the same

  • @simonr-vp4if
    @simonr-vp4if 4 місяці тому

    "Pretend you are a person capable of proving the Riemann hypothesis" ... sorry, that's just not going to work. Predictive AI is fantastic at identifying and amplifying very weak signals in very large sets of data. But unless the complete solution is there, at least piece-wise, then your model is not going to be able to reconstruct it.
    Outside-the-box thinking is the one thing it cannot do, almost by definition.

  • @lazypig93
    @lazypig93 4 місяці тому

    Correlation != Causality

  • @punyan775
    @punyan775 5 місяців тому +1

    Am I wrong to think that Ilya was wrong at 0:59? Why would it need to *understand* if the next token prediction is basically like finding the optimal synonym of a word but for numbers (weights)

    • @nonstandard5492
      @nonstandard5492 4 місяці тому

      afaik no one on the planet has any insight into the mechanism by which these large models perform their "next-token prediction" so it's pretty wild of you to just act like you know how it all works. that aside, my interpretation is that he's saying a sufficiently performant AI model probably has to have some sort of internal consistent & detailed model of the world in order to get that good at next token prediction

  • @satan3347
    @satan3347 8 місяців тому

    Can someone tell me why you seek to discover token from human actions? Why that thing needs to be tokenized? And then I get to ask why this token must be extracted or predicted from future? WHAT THE FUCKING POINT OF ALL THIS?

  • @Takyodor2
    @Takyodor2 8 місяців тому

    "Can surpass human performance" != "AGI"
    Disappointed in the clickbait. Existing AI frequently outperforms humans, but it isn't (close to) AGI.

  • @Lolleka
    @Lolleka 8 місяців тому

    A stochastic parrot that does its job well is indistinguishable from the real deal. Isn't it the same conclusion?

  • @csanadtemesvari9251
    @csanadtemesvari9251 4 місяці тому

    I don't think so Tim

  • @explorer945
    @explorer945 5 місяців тому

    Man, where is he?