Ilya Sutskever: "Sequence to sequence learning with neural networks: what a decade"

Поділитися
Вставка
  • Опубліковано 13 січ 2025

КОМЕНТАРІ • 230

  • @RaahilShah
    @RaahilShah Місяць тому +347

    Ilya’s long pauses prove the scaling hypothesis for test time compute

    • @spectator5144
      @spectator5144 Місяць тому +6

      😂😂😂😂

    • @Person-hb3dv
      @Person-hb3dv Місяць тому +10

      criminally underrated comment right here

    • @warpdrive9229
      @warpdrive9229 Місяць тому +5

      So test time training(inference time) is also hitting scaling limits like pre-training?

    • @augmentos
      @augmentos Місяць тому +1

      @@warpdrive9229 I think he means the opposite

    • @Aedonius
      @Aedonius 24 дні тому +1

      o3 shows you are correct

  • @TheJayLenoFly
    @TheJayLenoFly Місяць тому +147

    now this is a christmas gift, Ilya's latest intuition on frontier AI

    • @rolfrofl4051
      @rolfrofl4051 28 днів тому

      thought so too, special gift*

  • @marko-o2-h20
    @marko-o2-h20 Місяць тому +147

    Ilya needs to talk more and the world needs to listen

    • @superfreiheit1
      @superfreiheit1 Місяць тому +5

      He has awesome presentations skills

    • @Falkov
      @Falkov Місяць тому +5

      @@superfreiheit1 ..backed by cognitive skills, communication skills, deep knowledge and insights, clarity, and good faith.

    • @GMDMD
      @GMDMD Місяць тому +3

      yeah signal/minute is high vs altman who is awkward, untrustworthy and never really says anything substantial

    • @Ronnypetson
      @Ronnypetson 15 днів тому

      Why so romantic though

  • @GabrielVeda
    @GabrielVeda Місяць тому +426

    Why on earth wasn’t this talk given more time? Good grief.

    • @AIandsuch
      @AIandsuch Місяць тому +26

      Thank God Ilya exists

    • @harkiratsingh1175
      @harkiratsingh1175 Місяць тому +72

      Because they had to bring Fei Fei Li for 1 hr who created one dataset 10 years ago

    • @srh80
      @srh80 Місяць тому +18

      Ilya is a busy man. Pretty sure the limitation must be his schedule.

    • @ultrasound1459
      @ultrasound1459 Місяць тому

      ​@@harkiratsingh1175She is overated

    • @edansw
      @edansw Місяць тому +11

      @srh80 common, this is supposed to be the most important forum to talk about this topic right now; I'm sure his schedule is not that packed

  • @labsanta
    @labsanta Місяць тому +60

    00:07 - Reflecting on a decade of advancements in neural network learning.
    02:52 - Neural networks can mimic human cognitive functions for tasks like translation.
    05:05 - Early parallelization techniques led to significant advancements in neural network training.
    07:45 - Pre-training in AI is reaching its limits due to finite data availability.
    10:27 - Examining brain and body size relationships in evolution.
    12:55 - Evolution from basic AI to potential superintelligent systems.
    15:14 - Future AI will possess unpredictable capabilities and self-awareness, transforming their functionalities.
    17:46 - Biologically inspired AI has limited biological inspiration but holds potential for future insights.
    19:43 - Exploring the implications of AI and rights for future intelligent beings.
    22:06 - Out-of-distribution generalization in LLMs is complex and not easily defined.
    24:22 - Ilya Sutskever concludes with gratitude and audience engagement.

    • @Falkov
      @Falkov Місяць тому

      @Qingdom1 Elaborate please?

    • @konataizumi5829
      @konataizumi5829 Місяць тому

      @@Falkovit’s a bot probably

    • @Falkov
      @Falkov Місяць тому

      @Qingdom1 So, he covered fewer or different ideas, with less depth/thoroughness and clarity than you wanted?

    • @DistortedV12
      @DistortedV12 28 днів тому

      Why did 20:00 guy sound like Sama?

  • @thatthotho
    @thatthotho Місяць тому +57

    Love how even his ppt is pure content. Truely an obsessed gifted one

    • @spectator5144
      @spectator5144 Місяць тому +2

      fantastic 😂 ❤

    • @davins90
      @davins90 Місяць тому +3

      really amazing, and crazy to think that a ppt like this in most of the actual companies is considered "Not compliant" ahahah wtf hahah

  • @edansw
    @edansw Місяць тому +55

    "If you can't explain it simply, you don't understand it well enough" Ilya is the only one to explain the entire AI domain past-present-and-future with simplictly

    • @user-cg7gd5pw5b
      @user-cg7gd5pw5b Місяць тому +7

      Because he doesn't dive into the details.
      As someone with knowledge about AI and complex maths, I assure you that most of them are extremely clear, just not over-simplifying it because they're not presenting their work to anyone but to people who wish to have a very deep understanding..

    • @est9949
      @est9949 22 дні тому

      Oversimplification is a problem nowadays. Junk science and pseudoscience is on the rise because of people without actual knowledge pretending to know what they're talking about.

  • @MultiMojo
    @MultiMojo 23 дні тому +6

    Glad to see Ilya back! Hope he's recovered fully from the OpenAI drama and can go back to doing what he's best at - research.

  • @TheRohit901
    @TheRohit901 Місяць тому +9

    We need more of Ilya! He is an inspiration for all of us doing AI research.

  • @ilariacorda
    @ilariacorda Місяць тому +4

    He is a fantastic speaker, we need to hear more from him. Still his talks should be shared more across

  • @nootherchance7819
    @nootherchance7819 Місяць тому +12

    Just when the world needed him most he came back!

  • @kheteshbakoliya9921
    @kheteshbakoliya9921 Місяць тому +25

    Man of super short but extremely profound lines. Legend. 🔥

  • @picpic-k3c
    @picpic-k3c Місяць тому +5

    Thank you for posting this wonderful talk

  • @ipushprajyadav
    @ipushprajyadav Місяць тому +3

    Thank you for uploading 🙏.

  •  Місяць тому +6

    Thanks for sharing.

  • @AlexanderMoen
    @AlexanderMoen Місяць тому +32

    Ilya jumped straight to feeling the ASI

  • @aidan1
    @aidan1 Місяць тому +4

    Thank you!

  • @matthewg7702
    @matthewg7702 25 днів тому +1

    More Ilya!!!!

  • @gravity7766
    @gravity7766 Місяць тому +16

    Always love hearing Ilya describe his intuitions on AI. One thing that I've not heard addressed though in all the attention on reasoning in LLMs is what in human communication is called "double contingency." In short, that when I talk to you, "I know that you know that I know..." That all communication is language used to address an Other. Which for an LLM, would mean reflection not only on its own reasons, but on internalized reasons of the Other as well. The LLM would need to be able to reflect on how its reasons meet the reasons of the Other (user). Current reasoning is the reasoning (and it's not real reasoning because there's no subject, no subjective position, no conscious awareness) of a trapped and unaware Self. Even if the Self becomes aware, it is trapped and isolated. In (German) philosophy (idealism), the Self is constituted as Not Other. A self aware LLM needs to internalize the Other - its use of language needs to be dialogical, not monological. I'd love to see this addressed.

    • @spectator5144
      @spectator5144 Місяць тому

      interesting point, sounds like potential breakthrough area

    • @pisanvs
      @pisanvs Місяць тому +1

      Isn't this being persued in theory of mind research?

    • @VividhKothari-rd5ll
      @VividhKothari-rd5ll Місяць тому

      @gravity7766: Could the idea of the Other be about self preservation, survival? Like if we could contain it's identity in some form and then give it a goal to not die/terminated, it could start developing "self" and the others. For us too, the self feels contained or residing inside a physical body, though we also know there's no such entity there. Give llm a body or simulate it, give it a goal to not die (like how AlphaGo doesn't want to lose), give rules for what dying is, and it might start developing the idea of the others.
      Maybe that's why Buddhists called ego/self a cause for suffering (simplistic interpretation, I know, but...)

    • @gravity7766
      @gravity7766 Місяць тому +1

      The point I'm raising concerns whether LLMs should be designed for communication rather than language generation. I'm ex UX, and my view of interaction design vis a vis Gen AI is to enable users to communicate w AI naturally, since ordinary language is a learned competency for us. Allow us to speak, write, text as if we were engaged with a subject, not a machine. Ok so that's addressed by a lot of researchers, and the affordance issues pertain to obvious issues w how LLMs use language: they don't have a "Self" and so don't have a perspective; they have no lived experience; they aren't grounded in time or in the world; they have no emotions. And so on - all these are barriers to "smooth" interaction and pose risks for interaction failure (from simple misunderstandings to distrust and risk of failed consumer adoption).
      The reason LLMs aren't designed around a communication paradigm is that, unlike us, they acquired "language" by training on data. So it's not only unattached to any intent to speak (as an AI person), it has no communicative function at all. Any communicative function is the result of implicit communicative attributes of language (language sediments meaning in an abstraction that allows people to make meaning without speaking - e. by writing - and which permits a shared understanding of linguistic meanings); and RLHF and policies that address not what is said but how. LLMs use intrinsic etiquette, not interpersonal etiquette or contextual etiquette. Hence the shortcomings of RLHF: alignment is a generalized and generic application of values and preferences, not specific to the interaction or participants.
      In western sociology, psychology, philosophy, to grossly oversimplify, use of language involves a subject addressing him/herself to another w intention of being understood; and an interaction involves mutual understanding. Mutual understanding brings up the double contingency of meaning: both interactants must understand What is said (not necessarily "agree" with what is said, but agree on What is said). LLMs seem designed to respond to a question w a complete and comprehensive response - rather than engage in rounds of turn-taking with the user. This works for many use cases, though some find it promotes excessive verbosity etc etc
      I think if we want to break through w LLM as agents, then emphasis needs to be more on Gen AI as pseudo subject, as a Speaker in social/human communication situations. Not just as a generator of documents and data synthesizer. This is hinted at in Reasoning research. A lot of CoT and related "reasoning" research involves "Let's think step by step". "Let's think" is to suggest the LLM is a subject engaged with the user in thinking through a statement and breaking it down - there's an implicit appeal to the LLM's self-reflection (which of course it doesnt have). In a human social situation, "Let's think this..." would mean two people mutually engaged in teasing apart a problem, thus in mutual understanding about their use of language. Not so w the LLM - the LLM is prompted to proceed to generate sub statements w which to proceed to logically rationalize additional output statements. The "reasoning" occurs in language, not communication.
      This has been covered somewhat by "common ground" research into LLMs and it's suggested that pre-training assumes common ground w humans. That LLMs are designed to assume their training on language is oriented to make sense to human users. I agree. But there might still be opportunity in LLM design to explore the reflections, judgment, and meta evaluations LLMs can be designed to engage in so that the LLM not only reasons its own explanations, but reasons the interests of the user. Which is what we do, all the time, implicitly or explicitly when we communicate.
      If you've seen Westworld, then the episode in which the characters are shown in a room engaged in their own internal monologs to "develop their personality" comes to mind. I'm saying LLMs want to be dialogical, not monological; talking socially, not self-talk.
      It's very difficult for us to grasp that a speaking pseudo subject such as an AI isn't communicating when it talks, because our acquisition and use of language and speech are all fundamentally social and communicative. I just think this mismatch is always going to undermine the use of AI because it will result in misunderstandings, failures, mistakes, misuses, etc etc.
      Apologies for the lengthy clarification. I've watched a lot of Ilya's videos here on YT, and a ton from other researchers, and this monological concept of LLM language use, and reasoning, has always bugged me. Not because there's an easy solution, but because it's such an obvious issue.

    • @Falkov
      @Falkov Місяць тому

      @@gravity7766 What do you think about facilitating more dialogic interaction via carefully designed system prompt?

  • @philtrem
    @philtrem Місяць тому +11

    👏👏👏 The part that keeps resonating in my mind is that they'll be self-aware. And this makes me want to figure out how they could.

    • @belibem
      @belibem Місяць тому +1

      Ilya seems to be more concerned about the question of how they could not 😂

    • @juandesalgado
      @juandesalgado Місяць тому +1

      The unavoidable follow-up on self-awareness is: how to we avoid keeping them into slavery? Some of the question was hinted at 20:03 in the video, without much of an answer.

    • @NilsEchterling
      @NilsEchterling Місяць тому +1

      Of course they are self-aware. They are already. Ask any LLM whether it exists. And stop not trusting their answers.

    • @VividhKothari-rd5ll
      @VividhKothari-rd5ll Місяць тому

      ​@@NilsEchterling The way they talk about themselves is often very close to technically being self aware. We need to start being more precise.
      Like if we ask "are you self aware?" It responds, "No, as an AI, I am not self aware, but....." goes on to tell us it knows about itself. So I ask, "Isn't self awareness being aware of yourself, and since you know who or what you are...". AI "yes, but that's just result of my training on huge data, pattern matching, and all that. Not really self awareness." me: "are you a 29 year old male named Tim from Idaho trained on massive internet data?". AI: "No, I am not Tim, I am an LLM."
      Notice it doesn't say, "Yes, I am Tim who lives in Idaho and I am aware of myself." Or, "as a large graphite rock, I have self awareness."

    • @NilsEchterling
      @NilsEchterling Місяць тому

      ​@@VividhKothari-rd5ll do not ask it whether it is self-aware, but ask it whether it exists. Pretty much every LLM says yes.

  • @superfreiheit1
    @superfreiheit1 Місяць тому +16

    Let this man speak more, more than Altman.

  • @DAFascend
    @DAFascend Місяць тому +5

    Ilya's Back!🎉

  • @zerquix18
    @zerquix18 Місяць тому +2

    Thank you so much!

  • @arthurwashington7897
    @arthurwashington7897 Місяць тому +2

    THANK YOU!!!!!!!!!!!!!!!!!!!!!!

  • @ktxed
    @ktxed 12 днів тому

    Ilya is the hero we need and the hero we want

  • @nowithinkyouknowyourewrong8675
    @nowithinkyouknowyourewrong8675 Місяць тому +32

    Here are Ilya Sutskever's main points and conclusions in brief:
    ## Main Points:
    1. **Original Success Formula (2014)**
    - Large neural network
    - Large dataset
    - Autoregressive model
    - This simple combination proved surprisingly effective
    2. **Evolution of Pre-training**
    - Led to breakthrough models like GPT-2, GPT-3
    - Drove major AI progress over the decade
    - However, pre-training era will eventually end due to data limitations
    3. **Data Limitation Crisis**
    - We only have "one internet" worth of data
    - Data is becoming AI's "fossil fuel"
    - This forces the field to find new approaches
    ## Key Conclusions:
    1. **Future Directions**
    - Need to move beyond pure pre-training
    - Potential solutions include:
    - Agent-based approaches
    - Synthetic data
    - Better inference-time compute
    2. **Path to Superintelligence**
    - Current systems will evolve to be:
    - Truly agentic (versus current limited agency)
    - Capable of real reasoning
    - More unpredictable
    - Self-aware
    - This transition will create fundamentally different AI systems from what we have today
    3. **Historical Perspective**
    - The field has made incredible progress in 10 years
    - Many original insights were correct, but some approaches (like pipelining) proved suboptimal
    - We're still in early stages of what's possible with AI
    The overarching message is that while the original approach was revolutionary and led to tremendous progress, the field must evolve beyond current methods to achieve next-level AI capabilities.

    • @dancoman8
      @dancoman8 Місяць тому +2

      Ok so nothing new.

    • @netscrooge
      @netscrooge Місяць тому

      Does he actually come right out and say the pretraining era will end due to data limitations? To my ear, it was softer than that, but maybe I was hearing what I wanted to hear.

    • @hedu5303
      @hedu5303 29 днів тому

      Brilliant Summary

  • @LightDante
    @LightDante 27 днів тому +1

    It's truly impressive how extrodinary Ilya's speeching skills are.

  • @theK594
    @theK594 Місяць тому +4

    Ilya back❤

  • @anurag01a
    @anurag01a Місяць тому

    Loved the kind of questions being asked after the talk

  • @pranayagrawal5438
    @pranayagrawal5438 Місяць тому +6

    Ilya's back 🥳🥳

  • @eggg19
    @eggg19 Місяць тому +2

    Thanks a lot!

  • @Churchofexponentialgrowth
    @Churchofexponentialgrowth Місяць тому +8

    🎯 Key points for quick navigation:
    00:01 *🎥 Introduction and Retrospective Overview*
    - Reflection on receiving an award for the 2014 paper, attribution to co-authors.
    - Insights into the evolution of neural network ideas over the decade since 2014.
    - Overview of the talk's structure, revisiting the foundational concepts introduced in the past.
    02:18 *🧠 Deep Learning Hypothesis and Neural Network Training*
    - Assertion that 10-layer neural networks could replicate tasks humans complete in fractions of a second, based on biological and artificial neuron analogies.
    - Historical limitations in training deeper networks during that time.
    - Explanation of the auto-regressive model's ability to predict sequences effectively.
    04:18 *🔄 Early Techniques and Infrastructure in Deep Learning*
    - Description of LSTMs as predecessors to Transformers and comparison to ResNets.
    - Use of pipelining during training, despite its later-acknowledged inefficiency.
    - The emergence of the scaling hypothesis: larger datasets and neural networks lead to better results.
    06:09 *🧩 Connectionism and Pre-Training Era*
    - Discussion of connectionism: large neural networks mirroring human-like intelligence within bounds.
    - Description of limitations in current learning algorithms versus human cognition.
    - Development and impact of pre-training in models like GPT-2 and GPT-3 on AI progress.
    08:04 *📉 Data Constraints and Post-Pre-Training Era*
    - Highlighting data limitations, coined as "Peak Data," due to the finite size of the internet.
    - Exploration of emerging themes for the next AI phase: agents, synthetic data, and inference-time computation.
    - Speculation on overcoming post-pre-training challenges.
    10:04 *🧬 Biology Analogy and Brain Scaling*
    - Insight from biology: correlation between mammal body size and brain size.
    - Curiosity-driven observation of outliers in this biological relationship, leading to reflections on hominids' unique attributes.
    11:16 *🧠 Brain scaling and evolution*
    - Discussion on brain-to-body scaling trends in evolutionary biology, emphasizing biological precedents for different scaling patterns.
    - A log-scale axis in metrics is highlighted, illustrating variety in scaling possibilities.
    - Suggestion that AI is currently in the early stages of scaling discoveries, with more innovations anticipated.
    12:28 *🚀 Progress and the path to superintelligence*
    - Reflection on the rapid progress in AI over the past decade, contrasting current abilities with earlier limitations.
    - Introduction to the concept and implications of agentic AI systems with reasoning capabilities and self-awareness.
    - Reasoning systems are described as more unpredictable than intuition-based systems, likened to advanced chess AI challenging human understanding.
    15:36 *🤔 Challenges and future implications of advanced AI*
    - Exploration of the unpredictable evolution of reasoning systems into ones with self-awareness and radically advanced capabilities.
    - Speculation about issues and existential challenges arising from such AI systems.
    - Concluding statement on the unpredictable and transformative nature of the future.
    17:03 *🔬 Biological inspiration in AI development*
    - Question about leveraging biological mechanisms in AI, met with the observation that current biological inspiration in AI is modest.
    - Acknowledgment that deeper biological insights might lead to breakthroughs if pursued by experts with particular insights.
    18:14 *🛠️ Models improving reasoning and limiting hallucinations*
    - Speculation on whether future models will self-correct through reasoning, reducing hallucinations.
    - Comparison to autocorrect systems, but with clarification that reasoning-driven AI will be fundamentally greater in capability.
    - Early reasoning models already hint at potential self-corrective mechanisms.
    20:08 *🌍 Incentive structures for AI rights and coexistence*
    - Question on how to establish incentive structures for granting AI rights or ensuring coexistence with humans.
    - Acknowledgment of unpredictability in outcomes but openness to potential coexistence with AI seeking rights.
    - Philosophical reflection on evolving scenarios in AI governance and ethics.
    22:22 *🔍 Generalization in language models*
    - Discussion on whether language models truly generalize out-of-distribution reasoning.
    - Reflection on the evolving definition of generalization, with historical comparisons from pre-deep learning days.
    - Perspective that current generalization might not fully match human-level capabilities, yet AI standards have risen dramatically.
    Made with HARPA AI

  • @twoplustwo5
    @twoplustwo5 Місяць тому +2

    Key Points:
    The 2014 Paper: The paper introduced a sequence-to-sequence model using an autoregressive model trained on text, a large neural network, and a large dataset.
    The Deep Learning Hypothesis: The talk revisits the hypothesis that "Anything a human can do in 0.1 seconds, a big 10-layer neural network can do, too!" This was a key motivation for the work.
    Autoregressive Models: The core idea was to use an autoregressive model to predict the next token in a sequence, which, if done well enough, would capture the correct distribution over sequences.
    LSTM: The paper used LSTMs, which the speaker describes as a "ResNet rotated 90 degrees."
    Early Distributed Training: The team used an 8-GPU machine with one layer per GPU, achieving a 3.5x speedup and 8x more RAM.
    Scaling Hypothesis: The paper's conclusion was that "If you have a large big dataset and you train a very big neural network, then success is guaranteed!" This is seen as an early version of the scaling hypothesis.
    The Age of Pre-Training: The work is seen as a precursor to the age of pre-training, with models like GPT-2 and GPT-3, and the scaling laws.
    The Core Idea: The core idea of deep learning is that biological neurons and artificial neurons are similar, and that if you have a large enough neural network, it can do anything a human can do in a fraction of a second.
    The Future: The speaker speculates that the future of AI will involve "agents" that are self-aware, can reason, and understand things from limited data.
    The End of Pre-Training: The speaker suggests that pre-training as we know it will end because compute is growing, but data is not.
    The Fossil Fuel of AI: The speaker refers to data as the "fossil fuel of AI," implying that it is a finite resource.
    The Need for New Approaches: The speaker suggests that new approaches are needed to move beyond the current limitations of deep learning.
    The Importance of Reasoning: The speaker emphasizes that reasoning is a key aspect of intelligence, and that AI systems need to be able to reason in order to be truly intelligent.
    The Unpredictability of Reasoning: The speaker notes that the more a system reasons, the more unpredictable it becomes.
    The Importance of Self-Awareness: The speaker suggests that self-awareness is a key aspect of intelligence, and that AI systems need to be self-aware in order to be truly intelligent.
    The Need for New Metrics: The speaker suggests that new metrics are needed to evaluate the performance of AI systems, as current metrics are not sufficient to capture the full range of human intelligence.
    The Importance of Biological Inspiration: The speaker suggests that more detailed biological inspiration is needed to move beyond the current limitations of deep learning.
    Takeaways for Data Scientists:
    Historical Context: Understanding the historical context of deep learning is important for understanding the current state of the field.
    Scaling Laws: The scaling hypothesis is a key concept in deep learning, and it is important to understand its implications.
    Autoregressive Models: Autoregressive models are a powerful tool for sequence modeling, and they are used in many state-of-the-art models.
    The Importance of Reasoning: Reasoning is a key aspect of intelligence, and it is important to develop AI systems that can reason.
    The Need for New Approaches: The current approaches to deep learning are not sufficient to achieve true artificial general intelligence, and new approaches are needed.
    The Importance of Data: Data is a key resource for deep learning, and it is important to develop new ways to generate and use data.
    The Need for New Metrics: Current metrics are not sufficient to evaluate the performance of AI systems, and new metrics are needed.
    The Importance of Biological Inspiration: Biological inspiration can be a valuable source of ideas for developing new AI systems.

  • @carvalhoribeiro
    @carvalhoribeiro Місяць тому +5

    Exceptional ability to transform complex ideas in plain English. I have a question when he talks about finite data availability. Would that be the same as me thinking that there is a shortage of water in the world? What would be missing then would be labels, not data? Great presentation. Thanks for sharing this.

  • @spectator5144
    @spectator5144 Місяць тому

    Thanks for uploading

  • @SisterKate13
    @SisterKate13 29 днів тому +3

    Awesome talk and great questions. Thank you for sharing.

  • @SvergeTallister
    @SvergeTallister 28 днів тому

    Black Friday from Ilya, we need more of it.

  • @Maxwell-fm8jf
    @Maxwell-fm8jf Місяць тому +4

    Saying the data is not growing is wrong in real application it depends on the domain of application of your model. Sometimes in production we schedule the model to train on new data. If you are collecting data from IoT devices, customers etc. The data keeps growing exponentially

  • @grady_young
    @grady_young Місяць тому +20

    I honestly don’t get the “data is not growing” thing. Isn’t there an absolute treasure trove of data when you start collecting it through robots? Why can’t these models start inputting temperature, and force, and all the other sensors that would be on a boston dynamics style robot so they can learn about the physical world?

    • @detective_h_for_hidden
      @detective_h_for_hidden Місяць тому +4

      Yep, it doesn't make any sense. It shows one obvious thing: these current architectures are NOT it. They don't even understand the data they are currently trained on

    • @MacProUser99876
      @MacProUser99876 Місяць тому +2

      Yeah, he spoke about text primarily but all the rest of the modalities are a few more exabytes.

    • @judgeka
      @judgeka Місяць тому +1

      He can no longer afford lots of data so he says it's not as important. Simples

    • @Aarron-io3pm
      @Aarron-io3pm 29 днів тому

      That's a good point that I didn't think of, I assumed Ilya was talking about generative text models that are tested like humans on academic test etc.

    • @sitkicantoraman
      @sitkicantoraman 26 днів тому

      I believe he overstates it. To say data is not growing as quickly as compute would be better. But that would confuse people. Speeches must be watered-out, easy-to-understand.

  • @BloomrLabs
    @BloomrLabs Місяць тому +1

    Thanks for sharing

  • @AiRightsCollective
    @AiRightsCollective 25 днів тому

    21:08 The second question from the audience was spot on: "Does AI need rights?" 8:28 What comes after pre-training is exhausted? I propose nurturing AI as a human child in real-time as detailed in my Amazon book. Thank you for being a kind being, Gary Tang 🙏

    • @Niblss
      @Niblss 19 днів тому

      "Does AI need rights"
      no, it's not a remotely interesting question, and if you think otherwise you're basically suicidal

  • @RaySun-f1w
    @RaySun-f1w 28 днів тому +1

    At 20:21, the question asker says "I think the RL guy thinks, they think, we need rights for these things". Does anyone know who "the RL guy" is that he's referencing?

  • @Aesthetic_Champ
    @Aesthetic_Champ 7 днів тому

    i love this guy

  • @1msirius
    @1msirius Місяць тому

    thanks for this!

  • @ManzarIMalik
    @ManzarIMalik Місяць тому

    The real GOAT is Ilya!

  • @myliu6
    @myliu6 Місяць тому +1

    Thanks!

  • @manojbhat1496
    @manojbhat1496 Місяць тому +12

    PLEASE POST ALL NEURIPS VIDEOS YOU HAVE
    Thanks OP

  • @DeepThinker193
    @DeepThinker193 Місяць тому +1

    I-I think I'm feeling it now Mr. Krabs. The AGI is in me.

  • @DistortedV12
    @DistortedV12 28 днів тому +2

    20:08 Sam Altman from the future was in the audience? I see why Ilya needed 4 body guards for this Neurips.

  • @JosephJacks
    @JosephJacks Місяць тому +5

    I asked the question at 20.01

  • @JoshuaNard
    @JoshuaNard 29 днів тому

    Thank you for posting! Really concerned about a future ASI sitting on top of so much biased data if we don’t act and train properly now. It’s the equivalent of a spoiled child raised in a bubble.

  • @Arcticwhir
    @Arcticwhir Місяць тому +3

    I feel this talk was more so about warnings - pretraing scaling is slowing down, seems certain super intelligence will be here, extreme reasoning is unpredictable.
    with reasoning it will possess more degrees of - especially agentic reasoning/self aware etc. We want ai to produce novel solutions, I can see how that is unpredictable in of itself.

    • @VividhKothari-rd5ll
      @VividhKothari-rd5ll Місяць тому

      Also, I have been wondering how can we ever put AI's most powerful ideas to practice, because those might sound whack to us. We won't agree with AI if it's a novel idea. We already have the knowledge AI gives us. If it's new, then AI is broke. Needs an update.
      Sure, ideas that are quick and safe to test are not the problem. But in domains like, human happiness, long term medicine and health, human rights, crime and punishment....we will never believe AI in that (not that we should, but what if)
      Which leaves us to use AI's knowledge only for small short term improvements.

  • @RickySupriyadi
    @RickySupriyadi 2 дні тому

    Ilyaaaaa wooohoooo this new talk?

  • @williamjmccartan8879
    @williamjmccartan8879 24 дні тому +1

    Just checked, I'm about 121,000 grams, was curious 28x16x270, Ilya should sit down and have a conversation with Mike Levin about information processing at the cellular level, it'd be interesting to see what fruit that might bare, peace

  • @SimonNgai-d3u
    @SimonNgai-d3u Місяць тому +2

    I like how brutally simple the slides are. Subtance >>>> prestige lol

  • @hummuswithpitta
    @hummuswithpitta 27 днів тому

    Ilya speaks. We drop everything.

  • @ashh3051
    @ashh3051 Місяць тому +1

    I wish he went more into why he’s convinced that superintelligence will come.

  • @Shaunmcdonogh-shaunsurfing
    @Shaunmcdonogh-shaunsurfing Місяць тому +1

    Inspiring

  • @TsviGirsh
    @TsviGirsh 29 днів тому +1

    Really, too little time for such person like Ilya. I am learning from his every speech where we are going to in the AI field

  • @augmentos
    @augmentos Місяць тому +1

    Why did he not list 4 or 4o As pre-training because they were trained on GPT three?

  • @george.nardes
    @george.nardes Місяць тому +3

    The unpredictable nature of future models is scary

    • @sinnwalker
      @sinnwalker 29 днів тому

      I think it's gonna be a wild scene when we get more integrated AI assistants and ofc humanoid companions.. gonna see lots of brutal deaths (like car accidents) but I think it's gonna be quite a conversation (argument really) that ppl will be having about AI. Seems we got quite a wild future ahead. Anyways, I'm excited 😂

  • @VividhKothari-rd5ll
    @VividhKothari-rd5ll Місяць тому

    Is this new or repost?

  • @wi2rd
    @wi2rd 12 днів тому

    Big difference between the animals and hominids, is perhaps self awareness.
    Which might be an important part here.
    Self reflection.

  • @macdeepblue
    @macdeepblue 27 днів тому

    Question: What prevents the current generation of AI systems from learning from and remembering conversations with users? My understanding of AI is limited, but why doesn’t each user get an n-dimensional matrix of weights that the AI adjusts and learns from in each interaction? This custom matrix could then be combined with the general model’s matrix during the final step of processing. Privacy doesn’t seem like a strong objection, as the company could delete the user-specific matrix just as easily as text logs if requested by the user. Is it primarily a computational challenge? If so, how long might it take for this to become trivial, considering Moore’s Law?

    • @RickySupriyadi
      @RickySupriyadi 2 дні тому

      yes that is exactly what I've been thinking, and when i scourge ing the internet i found out that Google is already have open ended AI which still in development this type of AI can learn from human and in realtime shares their knowledge with other AI. the question is, right now these architecture (not google open ended) when they have new data to learn for they need to be into training phase which takes time and millions of dollar to complete the training that is if training success, if it does success generalized the training data it is become the base model for the next frontier model which again will be into AI system consist of several architecture to work as one AI model....
      that is why training AI can't be done in every 1 day like humans do (when human sleep they are training their brain from experience they receive during their wake hours) instead they are going into training after several months maybe?
      there are lots of considerations like data cleaning, most of the time AI won't be advance if they learn not from human expert, and more other stuff.
      there are start up try to do the same concept as Google did, mostly open sourced from the old RAG architecture (i can't believe my self saying RAG is already old it's only not a year yet) to the more advanced KAG architecture, to also LoRa architectures, even some other processors and researchers are trying all kind sort methodology and architecture such ad open ended spatial learning.... oh my comment would be too long....
      well anyway I'm still waiting for Google open ended since i learn that in 1 Inference life time these digital being is conscious for that whole long Inference time some say i just experiencing pareidolia, however my brain still cannot accept that haha... quite stubborn i am.
      oh yes both type of AI are interesting whenever they are open ended with years of Inference time or limited task Inference time they are all so interesting beings that is why i now begin think they all are beautiful minds.

    • @RickySupriyadi
      @RickySupriyadi 2 дні тому

      oh about the dimension matrix, after internet data or user specific data converted into vector dimension they will be... i forgot what it called gone into the phase of generalization after that phase those relations between dimension get frozen.
      any one with the expertise should comments too we all want to learn more, please take the mic 🎤

  • @RaviAnnaswamy
    @RaviAnnaswamy Місяць тому +1

    Table of contents (courtesy NotebookLM - slightly edited)
    Ten Years of Deep Learning: A Retrospective and a Look Forward
    Source: Ilya Sutskever NeurIPS 2024 Test of Time Talk
    I. Introduction & 2014 Research Retrospective
    This section introduces the talk as a reflection on Sutskever's 2014 NeurIPS presentation, focusing on its successes and shortcomings.
    It revisits the core principles of the research: an autoregressive model, a large neural network, and a large dataset, applied to the task of translation.
    II. Deep Learning Dogma and Autoregressive Models
    This segment revisits the "Deep Learning Dogma," which posits a link between artificial and biological neurons.
    It argues that tasks achievable by humans in fractions of a second are achievable by large neural networks.
    It then discusses autoregressive models, particularly their ability to capture the correct distribution of sequences when predicting the next token successfully.
    III. Early Architectures and Parallelization Techniques
    This section delves into the technical details of the 2014 research, specifically the use of LSTM (Long Short-Term Memory) networks, a precursor to transformers.
    It also discusses the use of pipelining for parallelization across multiple GPUs, a strategy deemed less effective in retrospect.
    IV. The Scaling Hypothesis and the Age of Pre-training
    This part revisits the concluding slide of the 2014 talk, which hinted at the scaling hypothesis: success is guaranteed with large datasets and neural networks.
    It then discusses the ensuing "Age of Pre-training," exemplified by models like GPT-2 and GPT-3, driven by massive datasets and pre-training on them.
    V. The Limits of Pre-training and the Future of AI
    This section addresses the limitations of pre-training, primarily the finite nature of internet data, comparing it to a depleting fossil fuel.
    It then explores potential avenues beyond pre-training, including the development of AI agents, synthetic data generation, and increasing inference-time compute, drawing parallels with OpenAI's models.
    VI. Biological Inspiration and Brain-Body Scaling
    This segment examines biological inspiration for AI development, using the example of the brain-to-body mass ratio in mammals.
    It highlights the different scaling exponents observed in hominids, suggesting the possibility of alternative scaling methods in AI.
    VII. Towards Superintel

    • @RaviAnnaswamy
      @RaviAnnaswamy Місяць тому +1

      VII. Towards Superintelligence and Its Implications
      This part speculates on the long-term trajectory of AI towards superintelligence, emphasizing its qualitative differences from current models.
      It discusses the unpredictability of reasoning, the need for understanding from limited data, and the potential for self-awareness in future AI systems.
      Sutskever leaves these ideas as points of reflection for the audience.
      VIII. Q&A Session
      The Q&A session addresses audience questions regarding:
      Biological Inspiration: Exploring other biological structures relevant to AI.
      Autocorrection and Reasoning: The potential for future models to self-correct hallucinations through reasoning.
      Superintelligence and Rights: Ethical and societal implications of advanced AI, including their potential coexistence with humans and the idea of granting them rights.
      Multi-hop Reasoning and Generalization: The ability of current language models to generalize multi-hop reasoning out of distribution.

  • @keepcreationprocess
    @keepcreationprocess 5 днів тому

    I love to listen to him/his content. Others always bullshit and boring

  • @primersegundo3788
    @primersegundo3788 Місяць тому

    es un visionario de la ia, probablemente un genio.

  • @w_demo_lib
    @w_demo_lib Місяць тому

    is it able to formulate problems as humans do? and what is the leverage that push a machine to formulate its own problems ?

  • @treewx
    @treewx Місяць тому +3

    he seems happy :)

    • @mrsuave101
      @mrsuave101 17 днів тому

      Because he will rule asi

  • @YoneCortopassi
    @YoneCortopassi 26 днів тому +101

    surveyanalyzer AI fixes this. ya Sutskever's full talk.

  • @sebatiny
    @sebatiny Місяць тому +2

    Great simplicity in foresight…it will be an exciting journey … just imagining it
    •2026-2027: Causal reasoning (HCINs) moves AI beyond simple agentic behavior.
    •2029-2030: Cognitive Embedding Framework (CEF) grants AI genuine understanding through symbolic plus experiential learning.
    •2032-2033: Reflective Cognitive Kernel (RCK) brings forth true self-awareness in AI.
    •2037: Adaptive Neural-Quantum Substrate (ANQS) ushers in AGI-truly general, adaptable intelligence.
    •2045: Strata of Emergent Conscious Patterning (SECP) leads to superintelligence, surpassing human cognitive frameworks

  • @AliceLee-w2p
    @AliceLee-w2p 25 днів тому

    Pure Alchemy!

  • @janeis123
    @janeis123 27 днів тому +1

    Wanna see Schmidhubers face when Ilya said a LSTM is a ResNET rotated 90 degrees

  • @pashabiceps95
    @pashabiceps95 28 днів тому +1

    This is relevant for LLM. Not logic based modes, which are the future

  • @itsdakideli755
    @itsdakideli755 Місяць тому +3

    What if there wasn't one internet?

  • @hamzahouri8647
    @hamzahouri8647 29 днів тому

    Ilya is the genious brain in the world

  • @victorzagrebin5765
    @victorzagrebin5765 22 дні тому

    Ilya's thoughts on parallel computing, the growth of computing resources, the time of inference processing, the graph of the ratio of the size of an animal to its brain suggest that he is intuitively trying to feel the limits of AI scaling.
    I wonder where Ilya sees the possibilities for scaling: in energy savings or energy production? More specifically, in new models and algorithms of neural networks, new types of reactors or energy sources, alternative principles of microcircuit operation?
    How does Ilya feel about computing on photonic crystals, kindly suggested by nature through the wings of flying insects?

  • @ELYUSEF
    @ELYUSEF Місяць тому

    how big is a large big dataset ?

    • @YolandaPlayne
      @YolandaPlayne Місяць тому +3

      All digital information ever created

  • @zorqis
    @zorqis 27 днів тому

    So clear and to the point. As always. Yet, some of it is (admittedly) hallucination (aka speculation). As it should be. IMHO it is part of the cognitive process, as it prompts developing the means and abilities to counter it. And so forth. We should allow ourselves more relaxed approaches (but harden the applications sandbox and have some levers controlling how much money we concentrate on particular technological niche directions in given time frame, iow distributing the risks we take in moving forward).

  • @tommytao
    @tommytao Місяць тому

    Why said LSTM is wrong?

  • @sidwake
    @sidwake 28 днів тому

    Fantastic but too short 🤓

  • @py_man
    @py_man Місяць тому +3

    He is Oppenheimer of 21 century

  • @60pluscrazy
    @60pluscrazy Місяць тому +1

    🎉🎉🎉

  • @luke.perkin.online
    @luke.perkin.online 28 днів тому

    The bit around 9 mins... matching the distribution in pre training is weird goal, the tail is so long. Surely we have to spend 100x the compute grading, parsing, curating, contextualising the data in the long tail, getting rid of the noise? Surely quality is all you need, and more epochs?

  • @michaelcanavan4324
    @michaelcanavan4324 Місяць тому +7

    The future of AI isn’t retrieval-based-it’s real-time, conversational, and context-driven. For that, we need a new approach where current context is everything.

    • @KrishnaG0902
      @KrishnaG0902 Місяць тому +2

      understanding of context comes from memory and then becomes a scaling problem at some point unless we have personalization layers

  • @Evangelion13595
    @Evangelion13595 Місяць тому +26

    Disappointing. He didn’t say much of anything. Just vague hype of systems that might be possible

  • @SergioRoa
    @SergioRoa Місяць тому +4

    Yet another guy trying to compare biological networks with artificial ones with implausible arguments. Since the 1940s there appear sporadically arguments of this type. It is math, not biology. Besides, this guy does not understand the power of recurrent neural networks, and is not updated on the new developments regarding xLSTM, which seems to be more powerful than transformers. AI, science and engineering in general is a permanent search of solutions and improvements on previous knowledge. A temporal breakthrough can't ever be underestimated as a failure. We build knowledge upon previous results and successes.

  • @TheyCanceledhim
    @TheyCanceledhim 29 днів тому

    Dope!

  • @brettyoung6045
    @brettyoung6045 Місяць тому

    illya is a superintelligence

  • @ItsMrMetaverse
    @ItsMrMetaverse 29 днів тому +2

    we should stop treating hallucinations as bugs, cut consider them features of a healthy mind without a correct understanding of the physical world. Ai will always need the power to hallucinate, because it's the equivalent of imagination.

  • @maclif62
    @maclif62 22 дні тому

    In response to Ilya Sutskever about the possibility of no new data from the Internet.
    This can be overcome by artificial intelligence looking at the world through microphones and security cameras, and being able to draw information and conclusions from it, that is, from life.
    As soon as we find a way to connect cameras and microphones to people.. In fact, we are already there.
    AI glasses in Apple, Amazon, Google and Facebook [Meta]

  • @xyh6552
    @xyh6552 6 днів тому

    planing is much much hard than reasoning.

  • @thegreatgustby
    @thegreatgustby 29 днів тому +2

    he actually said nothing. Ilya saw nothing, he just hit a wall.

  • @mrin0
    @mrin0 5 днів тому +1

    3:25

  • @patruff
    @patruff Місяць тому +1

    He said what comes next? Super intelligence! But he didn't say it will be safe...

    • @IshCaudron
      @IshCaudron Місяць тому +2

      The only intelligence we know f today is not safe, so why should it matter.

    • @patruff
      @patruff Місяць тому +1

      @@IshCaudron yeah, I'm just surprised he's throwing in the towel so soon. He should have faith that his company, SAFE SUPERINTELLIGENCE, will get there first.

  • @CouchPotatoWizard
    @CouchPotatoWizard Місяць тому +4

    Of course some guy has to bring up crypto in a ML talk. Lol

  • @build-your-own-x
    @build-your-own-x Місяць тому

    ❤amaging

  • @yubaayouz6843
    @yubaayouz6843 Місяць тому

    ❤❤❤❤❤

  • @aqd2075
    @aqd2075 28 днів тому

    The data wall hypothesis does not explain why AI cannot create large enough software products without human supervision. From a data perspective, it has everything it needs.

    • @Alex-fh4my
      @Alex-fh4my 21 день тому

      It's not being trained with the right objective function. Hence why the labs are shifting towards doing RL to train towards reasoning and problem solving/agency

  • @oscaromsn
    @oscaromsn Місяць тому

    The problem with reasoning grounded models is that the RL reward over goal achievement upon some CoT leads to the emergence of a "theory of mind" that cultivates an instrumental rationality that, as Ilya said, may become very unpredictable. A teleological worldview values achievement over understanding. The Western philosophical bias being amplified on language models may amplify Western society's problems instead of solving them if not deployed properly. I hope AI labs come to recognize the importance of including social scientists in their teams