How Far Can We Scale AI? Gen 3, Claude 3.5 Sonnet and AI Hype

Поділитися
Вставка
  • Опубліковано 29 вер 2024

КОМЕНТАРІ • 661

  • @Lucas-gt8en
    @Lucas-gt8en 3 місяці тому +499

    I think the fact that Zuckerberg sounds vaguely human is the most impressive AI advancement yet

    • @Penrose707
      @Penrose707 3 місяці тому +1

      Oh please, don't celebrate what was obviously a transparent PR move to become "cool" in the eyes of the youth. Broccoli hair and chains... all of a sudden. Ok. He's still the dude who purposefully rocked the Julius Caesar flow for like a decade. Pray tell. What other data does he want to steal from us to flip for profit? Which other brilliant app is he going to ~~create~~... er buy which will incipiate depression in our population ?

    • @dg-ov4cf
      @dg-ov4cf 3 місяці тому +32

      I was just about to comment about how painfully practiced his "normal human being" act is, like you just know he probably paid money to the best speech therapists and performance mentors in the world to coach him on all the mannerisms, the laugh and everything (unless you believe he just started hanging out with a bunch of chill frat dudes).
      Blows my mind how he goes from being (rightfully) clowned 24/7 for his ruthlessness and cold reptilian demeanor to somehow becoming everyone's favorite smiley happy-go-lucky bro, and now you see these comments everywhere he pops up. It kinda makes his feudal worldview even more insulting, because it kind of says he figured normal people were such easily manipulable NPCs that all it'd take was some smiles and a shiny open source model to go from being Mark "They actually trust me. Dumb fucks" Zuckerberg to Mark "Friendly Llama Man" Zuckerberg.

    • @reza2kn
      @reza2kn 3 місяці тому +9

      He's running on llama3-405B :D

    • @martiddy
      @martiddy 3 місяці тому +2

      Vaguely human, lol

    • @Merlin_Price
      @Merlin_Price 3 місяці тому +5

      He still looks exactly like I imagine Ronald McDonald looks without make-up.

  • @jonp3674
    @jonp3674 3 місяці тому +78

    Great video as always. I think one thing is it's really hard to compare machines to humans.
    So a pocket calculator is highly superhuman at arithmetic and really bad a tic tac toe, so how "intelligent" is it compared to a human?
    I also think if you ask undergrads "You have 20 seconds to produce the best answer you can to the following questions"
    "What were the causes of the 7 years war?"
    "How do plants use Boron?"
    "Give an overview of the Mayan religion and cultural practices around it."
    "Translate the above questions into 15 different languages."
    Then yeah clearly the current models are going to absolutely destroy all undergrads, in 20 seconds maybe a specialist in one of these subjects could garble something on one of the questions.
    So yeah it's really complicated. Like Cholet has examples a 5 year old can do and the models can't, but there's a lot of things the models can do that even expert humans can't. So it's really hard to compare.
    And a model which is super human at drug discovery and can't drive or play tic tac toe is still going to change the world massively.

    • @aiexplained-official
      @aiexplained-official  3 місяці тому +19

      Great points. And thanks Jon

    • @alihms
      @alihms 3 місяці тому +5

      It is as if we are working to have a one monolithic AI system that can do all those things. That's not the way to go. A simpler and better approach is to have a system that can pull the input from various sources (ie, AI sub-domains). Analyse them, make intelligent decisions and actions based on that. There should be an AI sub-domain that deals with factual data ( how do plants use Boron?), another sub-domain that deals with the heuristics and logical reasoning (should I walk towards that hooded guy in that dark alley?), computational system (plays tic-tac-toe, calculating when an eclipse occurs) and so on.
      By focusing on individual specialised sub-domain AI trainings, it should be easier and less error-prone to achieve the integrated general purpose AI.

    • @FortWhenTeaThyme
      @FortWhenTeaThyme 2 місяці тому +2

      Exactly. People will complain about minor hallucinations, but it's like...most doctors even have ~1% hallucinations. For my history teachers it was probably 5%. We're talking about a single brain that knows nearly everything about the world, and are complaining that occasionally it gets minor details wrong.

  • @daPawlak
    @daPawlak 3 місяці тому +5

    You are sole yt channel about AI (at least that I know of) that avoids falling either into hype or debunk pitfall as far as current state of LLMs and near future goes.
    I lost interest in a buch of others recently as they are just denying reality of issues with scaling. Half of year ago I was giving them benefit of the doubt but now if you are paying attention you must see it, so if one d doesn't t acknowledge it, I can't see excuse for it. At this point it's either ignorance or denial, and yet I only ever hear you, in the yt sphere, talking about current situation as it is.
    Thank you again for that!

  • @maximefournes9148
    @maximefournes9148 3 місяці тому +28

    I do not understand why people are getting more and more skeptical of scaling laws based on janky conceptual arguments when all the empirical results (for example sonnet3.5) show that they continue to hold. People don't seem to understand logarithms very well, including AI Explained, when they say "these results are not 4 times better". There are so many things wrong with this statement. If the model is 4 times bigger you should not expect "4 times better" results. And these benchmarks are all capped at 100% A better way to quantify the progress on a benchmark like this would be to look by how much the error rate has been divided.

    • @anonymes2884
      @anonymes2884 3 місяці тому +7

      To me the video is sceptical of the idea that scaling LLMs will achieve _AGI_ (as many AI leaders have been suggesting) and the improvements in benchmarks say nothing about that (I guess unless you subscribe to the - to me quite naive - position that AGI is just "reaching X% on Y different benchmarks", for some values of X and Y).
      I don't think many are claiming that LLMs have no use as tools but there's a big difference between that and general intelligence - if anything i'd say what the empirical results show is that a tool can score 60, 70 even 80%+ on various benchmarks and _still_ clearly _not_ be intelligent. So it's arguably starting to look more like an article of _faith_ that sufficiently scaled LLMs = AGI (or of course, just straight up self-delusion/cynical hype).

    • @Luigi-qt5dq
      @Luigi-qt5dq 3 місяці тому +7

      Exactly, 95% is 4 times better than 80%. I think at this point that AGI is not such an high bar if this is human's intelligence ahah. And almost nothing scales linearly in cost, like a Ferrari costs 5 times a standard car and it is only twice as fast. And it does still makes sense for some use cases

    • @josjos1847
      @josjos1847 3 місяці тому

      You're the one who said it

    • @chadwick3593
      @chadwick3593 3 місяці тому +1

      Based on the original scaling laws, I think we should expect about 50% error reduction for every 10x model size increase.

    • @mikebarnacle1469
      @mikebarnacle1469 3 місяці тому +4

      People are skeptical because S curves look like exponentials, and the benchmarks are not scientific in the slightest.

  • @Noraf83
    @Noraf83 3 місяці тому +1

    It's time for AI companies to publish ARC scores with new frontier model releases.

  • @alainaaugust1932
    @alainaaugust1932 2 місяці тому +1

    Will AI voice ever be able to tell the difference between decans, Declan’s, and deckhands? It doesn’t yet.

  • @CYI3ERPUNK
    @CYI3ERPUNK 3 місяці тому

    hype typically out-paces most development ; but AI development is certainly giving the hypetrain a run for its money XD

  • @BrianMosleyUK
    @BrianMosleyUK 3 місяці тому +4

    I have a feeling that all this hardware is being built, and so much attention is being given to research that an algorithmic breakthrough will meet the available scale of compute to smash through AGI / ASI. 2027 feels like a good call for that to happen by.

    • @hydrohasspoken6227
      @hydrohasspoken6227 3 місяці тому

      Dreamings.

    • @BrianMosleyUK
      @BrianMosleyUK 3 місяці тому

      @@hydrohasspoken6227 it's just a feeling, maybe you're right and it's a vision. Wishing us all luck for the future.

  • @betabob2000
    @betabob2000 3 місяці тому +2

    Thanks!

  • @shotx333
    @shotx333 3 місяці тому +3

    Man so slow is this because of posting on patreon sooner?
    anyway, thanks.

  • @oiuhwoechwe
    @oiuhwoechwe 3 місяці тому +1

    for sure video will be great. reasoning. not so much.

  • @mrpicky1868
    @mrpicky1868 3 місяці тому

    current models are one ways conveyor. obviously they can't adapt to new unique satiations ,but google already working on it. and the continues learning iw the thing that scares me. if they are already close to working model explosion might happen any moment. we will obviously not know about it until it's too late. that model will exlpode in testing stage behind not so robust lab doors. .i have some strong opinions. put me in debate with Yudkowskiy i am the only one who has actual counter points to doom. despite the fact that doom is indeed more and more likely scenario every passing day

  • @DeepThinker193
    @DeepThinker193 3 місяці тому +1

    Great, ns to see the honeymoon period disappear, and reality set in. So real progress can start now.

  • @Hunter-uz9jw
    @Hunter-uz9jw 3 місяці тому +200

    Zuckerberg becoming one of the more normal and balanced figures in tech is a welcome surprise lol

    • @Jm-wt1fs
      @Jm-wt1fs 3 місяці тому +38

      I have a feeling that he doesn’t really care about open source as a principle, but bc he was too far behind the closed source companies with AI development, it was just a business decision that was a smart move by him. Though I will say, regardless of his true beliefs or motives, the field of AI is in a significantly better place than it could’ve been, thanks to the new open source Zucc, man of the people. Who knows, maybe he’ll even open source the code he’s made of one day

    • @mikebarnacle1469
      @mikebarnacle1469 3 місяці тому

      ​@@Jm-wt1fs Facebook always had a great open source reputation, just look at React and the ecosystem they support around it. My personal favorite language Rescript, only really exists because they backed the core contributers. It's more about attracting talent, and I think an authentic recognition of the broader benefits of OSS. I don't like them, but gotta give them credit here. It could be worse.

    • @chrisanderson7820
      @chrisanderson7820 3 місяці тому +5

      @@Jm-wt1fs Everyone else decided to go Apple form factor and he decided to go the IBM PC form factor. Become the substrate, then you can sell Office for $500 a pop later on.

    • @faselblaDer3te
      @faselblaDer3te 3 місяці тому +1

      The balance of the force must always be maintained

    • @AISafetyAustraliaandNewZ-iy8dp
      @AISafetyAustraliaandNewZ-iy8dp 3 місяці тому +2

      I'm pretty confident that his position is going to age poorly.

  • @daveogfans413
    @daveogfans413 3 місяці тому +246

    3:48 - Legit sounds like someone is having a mental breakdown

    • @julkiewicz
      @julkiewicz 3 місяці тому

      A mental breakdown while also having an orgasm

    • @dexterrity
      @dexterrity 3 місяці тому +43

      sounds manic indeed

    • @lowmax4431
      @lowmax4431 3 місяці тому +5

      What is that clip from?

    • @VividhKothari-rd5ll
      @VividhKothari-rd5ll 3 місяці тому +7

      Sounds like old movies

    • @justtiredthings
      @justtiredthings 3 місяці тому +21

      It sounds like they deliberately trained it on "sexy" speaking, but it sounds insane here because it's so exaggerated and false and bc of the pedestrian context. Creepy.

  • @micbab-vg2mu
    @micbab-vg2mu 3 місяці тому +108

    Claude 3.5 Sonnet is great :) I cannot understand why less than 5% of people in the large corporation where I work are enthusiastic about generative AI-most of them haven't even tried it.

    • @HighPotentateCanute
      @HighPotentateCanute 3 місяці тому +37

      NFT grifters have poisoned the well, as far as getting the public really exited about new tech goes. Most folks are too tech illiterate and just want things that work. IMO of course.

    • @verigumetin4291
      @verigumetin4291 3 місяці тому

      @@HighPotentateCanute you live online if you think the average joe knows what an NFT is. Just like they didn't know and cared about NFTs, people don't know and don't care about AI.
      Until it takes their job away. Then they care.

    • @oiuhwoechwe
      @oiuhwoechwe 3 місяці тому

      they dont understand and dont want to understand because the result is scary for them. expect pressure for pitchforks and strict regulation soon by politicians. i expect they will create a false flag event to kick start that!

    • @sebastianjost
      @sebastianjost 3 місяці тому +40

      The frequent hallucinations are still problematic. You need experience to work around the limitations of most current LLMs. That's time people would need to invest to get the full benefits. In some domains LLMs can be super helpful, but deviating from established workflows is also a significant cost for most people.

    • @GeoMeridium
      @GeoMeridium 3 місяці тому +5

      @@sebastianjost I agree with you, but when you combine recent research with the level of scaling planned for the next wave of models, these errors and hallucinations are going to become a lot less frequent.
      Some hobbyists are already discovering ways of using AIs to create self-checking workflows, so I think that the common complaints about AI are going to fade.
      With that being said, the timeframe of development could get held up by chip production. In a November 2023 DARPA report, it was mentioned that OpenAI's training of GPT-5 had been held back by Nvidia's chip production backlog.

  • @jippoti2227
    @jippoti2227 3 місяці тому +116

    Demis Hassabis says that AI is overhyped in the short term and probably underestimed over the long term.

    • @aiexplained-official
      @aiexplained-official  3 місяці тому +30

      Sounds right!

    • @byrnemeister2008
      @byrnemeister2008 3 місяці тому +13

      He does seem to be the most grounded of all these AI superstars.

    • @alansmithee419
      @alansmithee419 3 місяці тому +14

      An AI youtuber (don't know which one, might have been AIExplained for all I know) did a poll about whether people thought it was over or underhyped, and many people went into the comments to say exactly that in spite of the poll, so that is not a rare opinion.
      Another (slightly less I think) popular opinion is that AI is overall overhyped in AI spaces but massively underhyped in general public spaces.

    • @sandy666ification
      @sandy666ification 3 місяці тому

      What does that mean?​@@aiexplained-official

    • @Rick-rl9qq
      @Rick-rl9qq 3 місяці тому +2

      ​@@alansmithee419You may be referring to David Shapiro. He's the one who usually does the polls

  • @iceshoqer
    @iceshoqer 3 місяці тому +46

    I was wondering when your Claude 3.5 Sonnet video would come out, this took a while!

  • @KitcloudkickerJr
    @KitcloudkickerJr 3 місяці тому +31

    Perfect watch for my spot of tea. The growth of ai this year has been breathtaking. 3.5 sonnet is such a pleasure to work with

  • @gball8466
    @gball8466 3 місяці тому +21

    We are at the dawn of a new era. People are arguing over if it's going to be 2, 5, 10, or 20 years from now. Not some second coming that will never happen, but measurable progress that we get to access in real time.

  • @johnnoren7244
    @johnnoren7244 3 місяці тому +108

    People are saying AI is overhyped, but the actual research being released says otherwise. We've had groundbreaking paper after groundbreaking paper being released this year. If anything, the pace has increased. It's just that you don't see it in the large models, yet. It takes some time to be implemented and tested since it's risky for the big players to make big changes. There is also a lot of money invested in the old technologies. Expect a lot of new players entering that are not tied to legacy architectures and hardware. Things are about to get wild.

    • @kingki1953
      @kingki1953 3 місяці тому +3

      Agreed, lot of research paper released in Arxivv but i can't even follow the single on because lack of understanding language and lack of basic knowledge of current newest LLM.

    • @Apjooz
      @Apjooz 3 місяці тому

      Machine learning has been eating the world for over a decade now. Doesn't seem to be going away.

    • @fromscratch8774
      @fromscratch8774 3 місяці тому +7

      Disagree. The same hype has made it that no one with any "groundbreaking" paper gets to sit on it for a second. Everyone is scrambling to stay ahead.

    • @squamish4244
      @squamish4244 3 місяці тому +3

      @@fromscratch8774 What he means is that so much of what we already possess is transformative, but it hasn't been integrated into most societal structures yet. We can't know what the results of integrating AI into drug discovery will be, for instance, because it's only been a few years, a literally impossible time frame to get results in anything resembling a human.
      Not that these industry leaders aren't deliberately hyping stuff with no basis to make such claims, of course - sure they are.

    • @SnapDragon128
      @SnapDragon128 3 місяці тому +7

      You're right... after all, even in some weird world where Claude 3.5 Sonnet is the pinnacle of AI forevermore, it will still change the world in incredible ways over the next decade. Civilization has discovered a brand new resource, and it'll take time to figure out how to use it.

  • @MrSchweppes
    @MrSchweppes 3 місяці тому +4

    Until evidence proves otherwise, we can assume that scaling laws are not reaching their limits. It seems that scaling AI models will continue to produce impressive results, or 'wow moments.' However, as Demis Hassabis noted about six months ago, large language models (LLMs) are just 'one of the ingredients' in AI advancement, suggesting that additional breakthroughs will be necessary. Nevertheless, it's clear that increased computational power, combined with more efficient use of that power, will remain key factors in AI progress. Therefore, the strategy of scaling up AI models is likely to remain important. Thanks for the great video! 👍

  • @trucid2
    @trucid2 3 місяці тому +7

    Just a few years ago we were amazed that GPT 3 could add two numbers, a skill it wasn't explicitly trained for. And now only a few years later we're disappointed that the reasoning these models can do isn't yet at adult human level?

    • @hydrohasspoken6227
      @hydrohasspoken6227 3 місяці тому +3

      Yes. Let me tell you why. Because they bought the "AGI soon" thing.

  • @manslaughterinc.9135
    @manslaughterinc.9135 3 місяці тому +75

    Man, Zuck is looking more and more human every day.

    • @alcoholrelated4529
      @alcoholrelated4529 3 місяці тому +7

      That's evidence that superintelligence has taken full control over him.

    • @Dan-dy8zp
      @Dan-dy8zp 3 місяці тому +1

      Once he is indistinguishable from us, the end comes soon after.

  • @mickmickymick6927
    @mickmickymick6927 3 місяці тому +5

    It's wrong to call Claude or GPT4o 'free', they offer a limited version for free, GPT4o being more restricted but even with Claude every hour or two I hit a wall and have to stop using it. We wouldn't call test driving a car getting it for 'free' so it's a shame we swallow these companies' marketing on this one.

  • @amirhussain3028
    @amirhussain3028 3 місяці тому +239

    Scaling AI as a strategy is one which favours monopolistic AI rather than than doing the hard work of inventing a better algorithm that could learn and inference energy efficiently

    • @caty863
      @caty863 3 місяці тому +38

      Scaling is the only strategy that is "realistic" by now. A breakthrough in new architecture/algorithm could come any time now, but who knows; maybe never. So, scaling is what we've got now.

    • @mrmooshon5858
      @mrmooshon5858 3 місяці тому +11

      I only partially agree with you. A bigger brain could mean a more intelligent animal. If we look at small examples of neural networks, sometimes you just don't have enough parameters to be able to predict the result at a high enough success rate. LLMs work with many languages, so many languages. And in every language there are so many concepts. If we humans have a brain that's about 20x bigger than the current biggest LLM(as far as I know, maybe I am wrong), then I think it's not quite fair yet to compare the two. That's not to say we can't achieve the same level with the current scale. I am just saying that it is very possible, maybe even likely, that the scale is not nearly big enough for agi.

    • @Kazekoge101
      @Kazekoge101 3 місяці тому +3

      The etched ASIC chips are working on that currently apparently

    • @julkiewicz
      @julkiewicz 3 місяці тому +3

      It's the mainframe vs home PC all over again.

    • @stcredzero
      @stcredzero 3 місяці тому +4

      Someone should work up a calculation on the amount of energy a human being uses for training our natural neural networks up until we're 20 years old as a benchmark for what is possible with regards to efficiency. I know there's also the Landauer limit, but being close to that is a tech level that's getting close to godlike "Clarketech." Being as efficient as a human being is probably a far-off limit from where we are with LLMs and GPUs. But it's a good benchmark for what we could do with hardware and algorithmic improvements. (Paradigm shifts, not just incremental improvement, of course.)

  • @karlwest437
    @karlwest437 3 місяці тому +31

    I think LLMs are like an artificial cerebellum, great at recognising, memorising and acting on instinct, but not very good at reasoning, so I think the next step would be an artificial cerebral cortex, with the neocortical column structure the human brain has

    • @thirdeye4654
      @thirdeye4654 3 місяці тому +9

      I understand what you want to say, but the cerebellum is mostly responsible for motor functions. I usually think of LLMs as the parts of the neocortex that is correlated with language, like Broca's and Wernicke's. And we need more subsystems to make "consciousness" possible. For example sensory input, memory, maybe also motor functions to give agency.

    • @karlwest437
      @karlwest437 3 місяці тому +2

      @@thirdeye4654 I think that things you need to consciously think and reason about, is done in the cerebral cortex, but stuff that you've done enough times, gets baked into the cerebellum, like motor controls, when you first learn to drive, you have to think through everything, but after a while it becomes automatic, you could say the same thing happens with language, once you've learned enough, you don't really think about it, it becomes completely natural and instinctive, it's only when asked some complex question that you have to kind of sit back and think, and that's when the cerebral cortex kicks in

    • @azertyuiop432
      @azertyuiop432 3 місяці тому +2

      A more apt analogy might be the the number of layers in the cortex. We could say that the current LLMs have a very gyrated cerebrum with a vast surface, great number of connections, but the intrinsing coordination is very lacking, there is no heterogenety, it is a large homogenous lump.
      There is no real compartamentalisation, with a central coordinator

    • @karlwest437
      @karlwest437 3 місяці тому +3

      @@azertyuiop432 yes, you could say LLMs are sort of unconsciously dreaming, and it needs some logical filter applying to it, which would be the cerebral cortex equivalent, the LLMs would dream up all sorts of solutions to questions or problems, and the cerebral cortex would analyse them and reject solutions that don't work or are nonsense, in fact hallucinations might be considered random dreaming with no conscious selection applied, essentially I think they need an artificial cerebral cortex to become conscious

    • @musicbro8225
      @musicbro8225 2 місяці тому +1

      @@karlwest437 Yes, primarily the frontal lobe is associated with decision making and problem solving, also reasoning, emotions and personality amongst other things (googled). It's a complicated relationship the cerebral cortex has with it's data though and not simple statistical prediction of patterns it would seem. More analogue and nuanced surely than typical digital.
      Perhaps this 'data processing unit' could best be coded by AI itself, since it's complexity is on a level of convoluted complexity many levels beyond mere database manipulation. But I think you're onto it.

  • @bobtivnan
    @bobtivnan 3 місяці тому +16

    I'm a high school math teacher who is optimistic about using AI to improve learning. The fact that it doesn't see the equivalence in q² and (-q)² could be viewed as a mistake, or it could be viewed as an opportunity for students to have conversations and find these mistakes and in the process improve their own understanding. Of course, you need a teacher to vet these instances. But I claim that it can be a great conversation starter and motivator because kids love to find faults.

    • @anonymes2884
      @anonymes2884 3 місяці тому +4

      Sure but then you have a very expensive tool (whether we end users pay for it or some venture capitalist) that's effectively doing the same job as "A bad maths book". _Any_ mistake is an opportunity to learn but I don't really think "mistake engine" is much of an achievement.
      (this strikes me as similar to the "brainstorming tool" idea for LLMs proposed by a researcher in the previous video, where they're apparently useful _because_ they make up weird nonsense - hallucinations, we're told, are a _feature_ not a bug. Fair enough but i'm not sure why we need to spend billions of dollars and waste huge amounts of energy to get the same outcome as an hour spent talking to your stoner friend from high school :)

    • @Hexanitrobenzene
      @Hexanitrobenzene 3 місяці тому

      ​@@anonymes2884
      "Brainstorming" you refer to is actually useful in the areas humans have not mastered. Like coming up with objective functions for reinforcement learning of robotic movement.
      LLM + Verifier is a very powerful tandem. AlphaGo, AlphaFold and aforementioned "robot trainer" are all of this type. The problem is, we don't know how to make verifiers as general as LLMs.

    • @musicbro8225
      @musicbro8225 2 місяці тому

      Using AI as a conversation starter seems questionable to me. We already see students using it to do homework because they're more interested in exploiting AI than actually learning stuff. They're happy to abdicate their inheritance of guiding the future to a machine so why are they going to suddenly be inspired to have informed conversations because AI was introduced to them with faulty functionality?
      The potential I see is it can teach one to one, which is huge! It won't cost as much as teachers, so a lot of teachers can be laid off and/or paid less since they're only overseeing now... Sounds brutal right, but that IS the future, there is no alternative in my mind. Not yet, but not long.
      Question is, will that make for smarter thinking, rational kids? What kind of world are they growing up into? Will critical thought be relevant any more or would they just need to be cooperative and indoctrinated? These are legit questions imo.

  • @Lorem_ipsum_dolor_sit_amet
    @Lorem_ipsum_dolor_sit_amet 3 місяці тому +76

    Pre trained transformer based AIs seem to be a brute force approach to a generalised AI system. Like a model with sufficient training data won't need the ability to genuinely reason if it has enough examples to pull from, assuming it never encounters a novel scenario (which in the real world it will).
    I'd imagine if we ever do get a "true" AGI system, it'll probably require significantly fewer resources than even GPT3.5, because if a system can reason it wouldn't require nearly a fraction of the data for extrapolating patterns as a current gen LLM.

    • @thenextension9160
      @thenextension9160 3 місяці тому +4

      Useful to use to make a more sophisticated form.

    • @blisphul8084
      @blisphul8084 3 місяці тому +2

      I think the real breakthroughs are happening with LLMs that focus on less training data and compute. First, Mixtral, then Phi-3, and now Qwen2 are the ones leading in efficiency. Sure llama 3 does things well, but it's clear they took the brute force approach, which hurts it in novel situations as you've said.

    • @fromscratch8774
      @fromscratch8774 3 місяці тому +1

      100%.

  • @josh0n
    @josh0n 3 місяці тому +10

    What are some of the most promising new approaches?
    - Hinton: time to think
    - Marcus: base knowledge/semantic structures?? IDK exactly
    - Kaparthy: Rethinking tokens and streaming input? With the models deciding how to chunk and what to pay attention to??
    -Others: continuous reinforcement through inference - ie combining training and inference
    - are these kinda right?
    - what else?

  • @KillTheWizard
    @KillTheWizard 3 місяці тому +43

    “Simply trusting words from the leaders of AI labs is less advisable than ever.” Agreed and I’m feeling way about most with power across many disciplines.

    • @Raulikien
      @Raulikien 3 місяці тому

      This tool that we are creating to be better than most humans at everything, will not replace you. New jobs (that this thing cannot totally do by definition) will appear. Buy my products. Thanks.

  • @theanonymoushackers1214
    @theanonymoushackers1214 2 місяці тому +4

    My life is totally fcked up. I am on the verge of giving up on everything. No one respects me. The joy from studying science and technology is what is keeping me going. Thank you for your work on AI.

  • @daveinpublic
    @daveinpublic 3 місяці тому +2

    I like this take.
    Some UA-camrs and news sites are saying AI will continue at this exponential curve…
    Others are saying it will plateau hard…
    The truth is we don’t know. And we’ll find out very soon. But I like hearing both sides, and knowing that AI finally has a place in our world, and it’s no longer sci fi.

  • @lostinbravado
    @lostinbravado 3 місяці тому +3

    Both. We're at the dawn of a new intelligence explosion. AND the hype has gone too far.

  • @comicipedia
    @comicipedia 3 місяці тому +4

    I think this is the first time I strongly disagree with one of your videos. I think you're falling into the trap of taking these amazing pieces of technology for granted having gotten used to them.
    I was absolutely blown away when I first used GPT 3.5. And in the year and a half since we've had gpt 4, Claude 3, gpt 4 turbo, gpt 4o and now Claude 3.5. Each better than the previous one.
    Models just keep getting smarter. Claude 3.5 is much much smarter than GPT 3.5 and it's only been just over a year and a half, scaling doesn't seem to have hit a wall yet.
    Sonnet 3.5 being 4 times bigger than 3.0 isn't much. Gpt 3 used 100 times the compute as 2 and gpt 4 used about 100 times as much as 3

    • @julkiewicz
      @julkiewicz 3 місяці тому

      It's definitely slowing down. It's marginally better sometimes worse in the tests that I performed.

    • @aiexplained-official
      @aiexplained-official  3 місяці тому +2

      Thanks for your perspective. Perhaps a downside of me being so immersed in AI. Still incredible achievements for sure, but incremental upgrades now

    • @comicipedia
      @comicipedia 3 місяці тому +2

      @@aiexplained-official but is it an incrimental upgrade? Claude 3.5 is a huge leap ahead of GPT 3.5 in a year and a half. There was over 2 years between gpt 3 and 3.5. Yes there have been lots of models in between, but that's why the jump feels smaller. Things are actually still moving very fast. 5

    • @digitalspecter
      @digitalspecter 3 місяці тому +1

      @@comicipedia It is incremental. Yes, the models do stuff quite a bit better but the stumbling blocks have remained pretty much the same: hallucinations, math weakness, logic problems, creating something actually novel etc. Yes, they're much more usable now but there hasn't been any fundamental breakthroughs which is especially damning when contrasted with the constant hype.. promises that can't be redeemed without solving problems that nobody knows when or even if they will be solved.. this is getting pretty close to straight up lying.

    • @comicipedia
      @comicipedia 3 місяці тому

      @@digitalspecter ita only been a little over a year since GPT4 and Sonnet 3.5 is much better than the original GPT4 at both maths and reasoning. 3.5 is even quite a bit better than GPT4o on the Arc AGI challenge which can't be memorised and which is a test of reasoning. 4o only came out a couple of months ago.
      Things are progreasing much the same way as before, the issue is people have unrealistic expectations. We've made quite a lot of progress in the past year.

  • @Raulikien
    @Raulikien 3 місяці тому +18

    The thing is, it doesn't really matter if the field slows down by 1, 2, 5 years... We are talking about the ultimate technology here. The fact that it will probably exists within our lifetimes is already an insane fact.

    • @SebastianLopez-nh1rr
      @SebastianLopez-nh1rr 3 місяці тому +3

      Speculation, not fact .

    • @hydrohasspoken6227
      @hydrohasspoken6227 3 місяці тому

      It won't

    • @christopherbelanger6612
      @christopherbelanger6612 3 місяці тому

      @@hydrohasspoken6227 That's a bold claim, and why not?

    • @hydrohasspoken6227
      @hydrohasspoken6227 3 місяці тому +1

      @@christopherbelanger6612 because that bold claim and expectation is mostly based on the hype created by content creators and CEOs with their "AGI soon" narrative.

    • @tgo007
      @tgo007 3 місяці тому

      @@christopherbelanger6612 Money. Tech can only develop through research aka money. In the short term, investors and businesses are happy to spend that money. Eventually if investors are not getting a return, then they stop investing. Everything is good now and Ai has helped do things faster and cut costs. But I think it's gonna hit the wall. We're already seeing it. Easy to go from 0 to 80. To go from 80 to 100 is very hard. USA spent 300 billion. Now from here on out. Each additionally 300 billion will make it 1% better. Then eventually 0.5% better. Then eventually 0.25% better.

  • @GabrielVeda
    @GabrielVeda 3 місяці тому +2

    This just goes to show how effective repetition is as a tool of persuasion. Gary Marcus has saturated X with his anti-LLM, anti-scale rhetoric and it is clear he is beginning to gain traction on the masses. Scale + better pre-training data has a *lot* still to give. That doesn’t mean we have to rely on scale alone, but dismissing scale at this early stage is just foolish and wrong.

  • @bobrandom5545
    @bobrandom5545 3 місяці тому +2

    Honestly, I think everyone is just missing the obvious. You're all comparing apples to oranges. AI is NOT comparable to human intelligence. So, measuring it against humans is a ridiculous thing to do. In some ways it will be better, in some ways it will be worse. And you'll get extreme cases on both ends. The reason for this is that artificial neural nets and humans are trained COMPLETELY differently on COMPLETELY different data. I'm not sure how the multimodal models get trained on data, but no matter how they do it, it will never compare to how humans get trained on multimodal data. Honestly, I'd love to know how multimodal models get trained, cause I'm quite skeptical about how "multimodal" they actually are.
    Let's look at a human. From birth we continuously process massive amounts of correlating data every second we are awake. This is multimodal CORRELATING data. And it's not multimodal in the same way that we call AIs multimodal. It includes sight, hearing, touch, pain, smell, taste, pressure, balance, temperature, proprioception, kinesthetic sense, etc, etc. These, among quite some others, are verified distinct senses that we process. In short, we experience the world with all our senses, while being "someone". It's very important to note here that all these senses occur concurrently, so they are temporally correlated and processed at the same time by our brains. Furthermore, while we process all this data, we also reflect on it by thinking in real time. Not just thinking about the incoming data, but also thinking about our thoughts about the incoming data.
    You could imagine when you teach a child what's up and down, releasing a ball from some height. They will process massive amounts of data only watching that ball fall to the ground. They'll hear you say "up and down", processing the language, they'll watch the ball fall in 3d, also moving in the 3d space with all their senses, they'll feel the position of their heads and eyes while they follow the ball, they'll hear the sound when it hits the ground, they'll feel their body going down as they grab the ball from the floor, I could go on. So, massive amounts of correlating data, in just 10 seconds or so.
    Now let's look at an LLM, for example. An LLM just gets fed text and is made to predict the next token. That's it. You feed it massive amounts of text and it gets really good at predicting the next token. And, it gets so good that, in its complex neural net there are so many connections between all the tokens that you could say it has some kind of "understanding" of what words and concepts mean. But, it's still just language. How can you expect a language model to understand outcomes of certain physical situations, while it has never actually seen/heard/felt/etc (all our other senses + thinking about those senses), what it "talks" about. Isn't it expected that it's gonna make ridiculous mistakes?
    So then we judge the intelligence of LLMs and dismiss their "intelligence", because they make stupid mistakes that the dumbest human wouldn't make. And on the other side we've got team AI, who are amazed (and probably blinded) by the capabilities, who say that AGI is near, because AI can beat x% of people in some task. Like, really? Just think about it for a second and you'll see that the whole thing doesn't make any sense.
    This has become a whole essay almost, haha. Lemme just end with this. I think we're on to something. I think it's crazy that LLMs are as smart as they are, just being trained on language. The same about other neural networks btw. I think it's obvious that there's some kind of alien intelligence embedded in these models. And I call this intelligence "alien", because it's obviously completely different from that we call intelligence from a human (or even animal) perspective. But, in the end, what is intelligence but recognizing patterns?
    Ok, one thing I want to add. I think a huge difference in machine intelligence and human/animal intelligence is the fact that we can, in real time, reflect on our thoughts. We can combine all our senses, process them, think about that process, think about how we think about that process, and so on, ad infinitum, all in real time, while more information from our senses comes in. It's kinda crazy really.

  • @choltha
    @choltha 3 місяці тому +2

    AI usefulness is overhyped on short timeframe (6 months) , underhyped on long perspective (2+) years.
    Maybe if the scaling hype cools down a bit we can integrate the vast amount of research results on other areas which would unlock now dimensions of capability, which are not related to just scale.
    If we don't catch up with this AI integration side , we might get into the weird situation where we have like a car with a 1000 kw motor (GPT-4) but we drive like we are on an ice like surface (chance of hallucinations, not able to quicky adapt to new situations, etc.) and can only go really slow as a result, not putting the power to good use. Now if we put on spikes on the tires (good integration, as mentioned before), there might be sudden jerk (jump in end-result-capabilities) that catches most people off guard.

  • @dsmogor
    @dsmogor 2 місяці тому +2

    The point is that the whole power generative LLM is more of an discovery rather than invention. Transformers were created for automated translations and all the generative capabilities are just an observed side effect that accompanied the scale of training that transformer made possible. Nobody including chief scientist at OpenAI predicted what would gpt 3 be capable of compared to gpt 2 so all the current promisses are at best just wild guesses at worst sheemes to keep the share price up.

  • @SirQuantization
    @SirQuantization 3 місяці тому +30

    When I see AI being shown to act strangely like at 3:47 I always wonder what the person did to prompt it. There's no way to tell if they prompted it with, "Start rambling like a crazy person" and then pretended to act shocked. It happens a lot (not always ofc)

    • @julkiewicz
      @julkiewicz 3 місяці тому +6

      Not really, it sounded unhinged at times in the OpenAI demos as well.

    • @anonymes2884
      @anonymes2884 3 місяці тому +2

      No offence intended but when I see sceptical responses to LLMs doing weird stuff I always wonder what LLMs that poster has been using. Because I see them spout nonsense pretty much every time I use one for more than a simple query or two (even without actually _trying_ to trip them up) - in fairness _usually_ "articulate nonsense", that might _sound_ plausible to someone without domain specific knowledge, but still nonsense.
      (so given the millions of people using them every day it's not at all surprising to me that every now and then an LLM will just spout _actual_ gibberish, like we might expect from someone who's high or having some form of mental health episode)

    • @ronnetgrazer362
      @ronnetgrazer362 3 місяці тому

      I saw another example from that leak/botched AB-test/glitch and there was talk of replies being completely unrelated to the prompt.

    • @johnnoren7244
      @johnnoren7244 3 місяці тому +3

      Fake videos/screenshots of AI acting strangely are unfortunately very common because they get many views. Whatever gets views gets faked.

  • @mattshelley6541
    @mattshelley6541 3 місяці тому +3

    Entirely agree, people need to stop deifying these tech leaders. As he points out at the end, he has no real world experience in biology, yet his claims are being retweeted.

  • @AllisterVinris
    @AllisterVinris 3 місяці тому +24

    I think it's a whole. Scale alone won't do everything, but it'll help. Techniques and stuff alone won't fix all the problems; but it can go a long way. Together though ... That's where it's at.

    • @Ecthelion3918
      @Ecthelion3918 3 місяці тому +2

      Agreed

    • @GodbornNoven
      @GodbornNoven 3 місяці тому +2

      Absolutely

    • @Apjooz
      @Apjooz 3 місяці тому +1

      If there were easy tricks we would have found them already. So scale scale scale it is.

    • @AllisterVinris
      @AllisterVinris 3 місяці тому +1

      @@Apjooz I mean yeah, at the very least we can always scale up. But while there isn't any *easy* trick that we haven't found, there still might be more complex and potentially revolutionnary trick left to discover, you never know.

    • @AfifFarhati
      @AfifFarhati 3 місяці тому

      @@AllisterVinris In fact , to my knowledge , the simpler the trick , the harder it is to find , simple=/=easy

  • @peersvensson9253
    @peersvensson9253 3 місяці тому +5

    Speaking as someone working in research, these tech bros don't have a real understanding of how scientific progress happens and seem to believe in the "lone genius" trope popularised by movies and TV shows. CRISPR, as an example, was not discovered through some exercise of brute force intellect, it was discovered by people doing actual experimentation. Similarly, one of the main problems in modern physics is the fact that we don't have enough experimental guidance for the development of new theories, and less so that people aren't smart enough to come up with new theories.

    • @Frostbiker
      @Frostbiker 3 місяці тому

      The vast majority of the "tech bros" researching AI have, unsurprisingly, a background in academic research. They know how research is performed because it's all they have ever done. My criticism from working with those guys would be that they don't have adequate business experience.

  • @Michael-ul7kv
    @Michael-ul7kv 3 місяці тому +2

    the training is more expensive but they're actually cheaper to run

  • @reza2kn
    @reza2kn 3 місяці тому +2

    I love this man exactly as much as I don't trust Sam Altman.

  • @kylewollman2239
    @kylewollman2239 3 місяці тому +16

    I would guess that OpenAI previewed their advanced voice model when they did to try and take attention away from Google's event the next day, knowing full well that it wasn't going to be released to the public for months. When tech companies start saying that revolutionary things are coming at the end of this year or just a couple of years away, it means they have nothing and are just hyping. That's the most valuable lesson I learned from Elon Musk.

  • @danagosh
    @danagosh 3 місяці тому +2

    It may be hype because of the extreme claims everyone is making but I also don't think they are outlandish. This technology is increasing exponentially. Exponentials are hard to fully appreciate sometimes. As Dario Amodei said at the end, if chips keep improving over the next few years and companies keep throwing money at this, the AI systems could reach another patch of emergent capabilities. And with the number of people now working on the problem, if you throw in another Transformer-like breakthrough, they could become very capable almost overnight.
    I would rather have society start talking about and preparing for a world in which we could reach powerful AI (maybe not AGI but near it) in the next few years then have us all be blindsided if it happens. Imo, people need to start taking it very seriously rather than saying it is just a bubble that will lead to nothing. In fact, I think all people working on alignment would love if this tech plateaus for awhile so more research can be done and also for society to adjust. An extra 5 years would be great, please.

  • @Pizzarrow
    @Pizzarrow 3 місяці тому +41

    Judging simply by the rate and tone of your recent uploads, we all need to accept, the pace of AI progress is slower than we might have thought.

    • @M1ntt806
      @M1ntt806 3 місяці тому +6

      I'm so glad that he made this video and is open and honest about his own scepticisms regarding the recent developments/ lack thereof and the hype around them.

    • @2CSST2
      @2CSST2 3 місяці тому +14

      I personally disagree, not necessarily with the possibility that AI progress is slowing but with what seems like a conclusion people are making about it right now, including in this video.
      I especially don't get how a lot of people right now seem like they think they have a good handle on the limitations of scaling, when it's not something so easily predictable.
      As far as I'm concerned, there's not any concrete grounds for opinions to change on that point anymore than GPT4, since we haven't actually seen a truly new scale since GPT4.
      That's what will determine it, not guessing about how less or more bullish Suleyman or any other tech leaders are now compared to before, or looking at the incremental improvement of Claude Sonnet 3.5 compared to previous models.
      None of that is solid evidence, let's wait the same time gap and increase in scale that happened between GPT3 and GPT4 before getting all that hasty and drastic in claiming what scaling can and can't do. Anything before then is mostly conjecture.

    • @aisle_of_view
      @aisle_of_view 3 місяці тому +2

      That's good, I want a job for a few more years.

    • @Tomjones12345
      @Tomjones12345 3 місяці тому +1

      ​@@2CSST2you mention it but seem to be dismissing it. It being Claude 3.5 vs 3. 4 times the training data, but no where close to 4 times improvement. Of course we are going to see diminishing returns by simply throwing more data at the problem. You can argue we don't know yet, but maybe not definitive evidence, but Claude version comparisons suggest we might no longer see big leaps with just more data.

    • @Apjooz
      @Apjooz 3 місяці тому +1

      I just love people who talk like they sprung into existence 12 months ago. Unless you are not a human...

  • @scoops2016
    @scoops2016 3 місяці тому +3

    Thanks, succinct and infornative as always. I had been waiting patiently for my dose of AI Explained.

  • @oimrqs1691
    @oimrqs1691 3 місяці тому +14

    Is the whole video based on a unproven assumption that Claude 3.5 Sonnet was trained on 4x more data than 3.0 Sonnet? Weirdly skeptical video, wasn’t expecting that.

    • @aiexplained-official
      @aiexplained-official  3 місяці тому +14

      It's more I am getting tired of the hype from the leaders. Distracting from the great models

    • @JeffBuckleyFanboy
      @JeffBuckleyFanboy 3 місяці тому +2

      @@aiexplained-officialSo is it proven that Sonnet 3.5 was trained on 4X the data?

    • @aiexplained-official
      @aiexplained-official  3 місяці тому +3

      Nothing can be proven without insider knowledge but they did a safety post saying they are currently testing a frontier model (likely 3.5 Opus) which has 4x compute its predecessor so limited evidence that Sonnet 3.5 may have a similar multiple.

    • @JeffBuckleyFanboy
      @JeffBuckleyFanboy 3 місяці тому +4

      @@aiexplained-officialIt could be that Sonnet is simply a checkpoint of the larger model they will be releasing later this year.

    • @41-Haiku
      @41-Haiku 3 місяці тому +3

      @@JeffBuckleyFanboy This seems likely to me. That could mean 3.5 Sonnet was trained on twice as much compute as 3.0 Sonnet, which perfectly comports with the increase in performance. The model performs almost exactly twice as well on each benchmark (in terms of halving failure rates).
      If I'm correct about where Claude 3.5 Sonnet lies on the scaling hypothesis curves, it looks to me like it perfectly matches up with expectations. The scaling laws have held from the word go and we should expect them to continue to hold.

  • @superfeel1275
    @superfeel1275 2 місяці тому +1

    I think if we were able to scale extremely big (like 1000x the hardware and maybe 300-500x the traning data) and that we used 1-character tokens for tokenization, we could achieve an AGI. My reasoning is that at some point, to reduce loss, the model has to "predict" how our world works for the text to make sense. For example, if you put a book on a table and move the table, intuitively the book moves as well. But I doubt there's a specific passage out there that describes this. So the model will see situations where the book is displaced when a room gets thrashed for example, and intuit that to reduce loss, we now have to dedicate some weights to that concept. Of course this is the step after memorization isn't good enough and would probably require tons of scale.
    THOUGH, naive scaling is too unrealistic and expensive. You'd probably want mechanism that highlight the "weights" that encode for different reasonings, or how to extract reasoning from training data and not just memorize (which Anthropic has made huge efforts towards funnily enough). Alos, 1 character tokens would solve the issues with like not being able to reverse a word, find words that end with specific suffixes and would help generalize "word patterns" in general

  • @chrisanderson687
    @chrisanderson687 2 місяці тому +1

    What if the models are purposely losing to us at tic-tac-toe, the way I do with my young nieces and nephews? You know, to not devastate them with my superior intellect and encourage them a little to keep learning. You know, because these things are heavily trained to "please" us. Because, tic-tac-toe isn't a "task", it is a game, and games have consequences, it's not simple to just say "beat me at chess", because the AI may consider the human player's emotions! Or, more sinister... they could be letting us win on purpose to hide their true intelligence, so we don't shut them down out of fear. How can we be sure that isn't happening?!

  • @chongshaohong2969
    @chongshaohong2969 3 місяці тому +1

    Do you have any comments or plans to do a video on the recent news of Perplexity plagiarizing articles and ignoring robots.txt? Also on Mustafa Suleyman's recent comments on CNBC?

  • @kernfel
    @kernfel 3 місяці тому +1

    Highschooler or undergrad? Sounds about right: Now that they've scraped approximately the entirety of available text data, the LLMs are approximately at the level of an average human, purely in terms of language. I don't think it's warranted at all to assume that, adding more model parameters or more data of similar quality, these models will get substantially more intelligent. To the extent that the training data is garbage (let's assume human output is not exactly the pinnacle of rationality, say), the model internals and outputs are also garbage. To stick with the metaphor: why would anyone assume that running ever more garbage through the training process would lead to gold and diamonds?

  • @AleksandrVasilenko93
    @AleksandrVasilenko93 2 місяці тому +1

    I believe in the long run the AI hype is all true, just not in the time frames given. Eventually we will get true AGI. Is it in a year? No. Is it in 100 years? Also no. Somewhere between 1 and 100 years? Yes, closer to the 1 side of course.

  • @Jack0trades
    @Jack0trades 3 місяці тому +1

    It looks to me like the recent rapid growth in AI performance, based on massive amounts of training data, is limited by that data. Instead of an exponential rise over the course of this epoch, we get more of a sigmoid - rising rapidly, then settling into an asymptotic approach dictated by the limits of that data. We will likely find better ways to extrapolate beyond those limits, but that will require some fundamentally new techniques.

  • @shawnvandever3917
    @shawnvandever3917 3 місяці тому +1

    There is no doubt more than scale is needed. However scale is important it seems to organize world models much better. Someone with a low IQ versus High IQ is not architectures, it is better efficiency in structure and organization of mental maps. So I see how scale can do a ton to make things better. I still believe we need continuous learning, the ability to make many predictions and the ability to update mental models on the fly.

  • @lucifermorningstar4595
    @lucifermorningstar4595 3 місяці тому +1

    Scaling works but we need novel architectures than can leverage the creativity and generalization inherent of Transformers with reason, memory and self data interpretation and connection

  • @lucnotenboom8370
    @lucnotenboom8370 3 місяці тому +32

    I don't want "undergraduate level" models that get the basic stuff wrong! I want primary schooler models that actually understand what they're doing!
    To put it differently, in university I took pride in learning differently from other students. Where many studied to pass the test, with the memorization and blind application of knowledge that comes with that mindset, I would learn to understand a topic so that it would become like an addition to my common sense, and then figure out the answers on the fly on the test.
    We should not be scaling mere processing of information, we should be scaling the comprehension of these models, and I believe that at the core, they're not trained for it. They happen to pick up some comprehension, but it's not their main focus.

    • @Josh-ks7co
      @Josh-ks7co 3 місяці тому +2

      Making gpt create a board and play tick Tak toe in a chat engine is a gimic to show random edge case limitations. I am not saying limitations don't exists just that's not a good example.

    • @orterves
      @orterves 3 місяці тому

      I think that's the sort of thing Bill Gates is referring to in the interview - meta-cognition capabilities

    • @lucnotenboom8370
      @lucnotenboom8370 3 місяці тому

      @@Josh-ks7co I mean, it gets basic math and logic wrong all the time, which points at there being no real reliable comprehension on which its utterances are based

    • @41-Haiku
      @41-Haiku 3 місяці тому

      @@lucnotenboom8370 Same as you and me, I guess.

    • @chinesesparrows
      @chinesesparrows 3 місяці тому

      I don't want an Altman i want a Primeagen

  • @not_a_human_being
    @not_a_human_being 2 місяці тому +1

    I think we're overestimating humans... Those "noble laureates" aren't some separate breed of human beings, we praise them, we put them high on our pecking order - that's that. Gost in the shell has answered that question long ago. Question is not when it'll be as smart as us, but when are we going to admit that we aren't as smart as we thought.

  • @DanielSeacrest
    @DanielSeacrest 3 місяці тому +1

    4x the compute doesn't equal to 4x the improvement. It doesn't get 4x the score on the MMLU lol, but we can still see the correlations between compute scaling and performance on set benchmarks and I don't think this correlation has been decreasing. We know atleast some of the ramifications of scaling, we can reliably predict the MMLU score of models given specific compute scaling as an example.
    But another important thing is effective compute. This number takes into consideration improvement (i.e. better data quality), algorithmic efficiencies, and raw compute scales (explaining it for anyone who doesn't know). Now we don't necessarily have this information on hand from Antrhopic but raw compute scaling is obviously not the only way to scale.
    And I don't think Claude 3.5 Sonnet got larger, since the cost didn't change and it actually got faster I believe, so it was likely just trained on a lot more data.
    But if anything the compute scaling we are talking about here is still just GPT-4 level. I do not believe it went far ahead of the compute that went into GPT-4, and we can see this in benchmarks and performance (similar reasoning flaws to GPT-4). It is 4x the compute over Claude 3 Sonnet getting to about GPT-4 scales. And it is kind of disappointing we haven't seen any kind of significant scaling over GPT-4 class at all yet, but with Claude 3.5 Opus (probably 4x compute over 3 Opus), Gemini 1.5 Ultra and GPT-4.5 (probably 10x compute over GPT-4) on the horizon I feel like this is going to (hopefully) change soon. Although it isn't that surprising because from what I recall Anthropic said they wouldn't push the frontier, and I believe they haven't. A Claude 3.5 Opus release would have definitely pushed the frontier, but Claude 3.5 Sonnet is more or less GPT-4 level compute / class with likely better post training techniques.

  • @DrEhrfurchtgebietend
    @DrEhrfurchtgebietend 3 місяці тому +2

    You are missing something huge here. I'm a physicist. Our brains do not generate accurate world simulations. We have guardrails that help us understand when something's out of place. This is much closer to what a video generator does than what a physics engine in a video game does.

    • @ccash3290
      @ccash3290 3 місяці тому

      We can imagine accurate world models when we think and use those models to make predictions.

    • @DrEhrfurchtgebietend
      @DrEhrfurchtgebietend 3 місяці тому

      @@ccash3290 we can imagine them, but we don't actually have an internal representation which is an accurate physical model. We get close in a lot of aspects but are way off in others. A good example is when people are thrown in movies, they often don't follow parabolic trajectory like they should. However, since we've seen that so many times in movies, we think that that is actually how somebody would move if they were thrown. But they're on wires so it makes it hard to do a parabolic trajectory. There's a lot of interesting stuff going on with our training data

    • @ccash3290
      @ccash3290 3 місяці тому

      @@DrEhrfurchtgebietend I agree. We have an inaccurate world model because it is good enough for survival.
      It is not good enough for human-level thinking or labor.
      AI models need a mostly accurate world model because that is the level humans produce when we work.

    • @DrEhrfurchtgebietend
      @DrEhrfurchtgebietend 3 місяці тому

      @@ccash3290 okay sure I don't disagree with that. My point is that it's not the same as having a world model like a physics engine in a video game where it's actually calculating the things. If you look at some of the AI generated stuff, sometimes the angles don't stay consistent as it moves about because it's not really modeling a three-dimensional thing and rendering as it moves. Tto our eye, that would be very jarring. What AI is doing is building a model to achieve a goal similar to what we do, but different because the goal is different but neither of us are producing accurate world models.

  • @omniopen
    @omniopen 2 місяці тому +1

    One thing I’ve noticed with the biggest LLMs is how having them come up with an answer and then code the process in Python drastically improves the numerical and analytical accuracy of their solution. Not entirely sure what’s going on there but I’ve had it easily convert handwritten numbers to digital values and then from those values create lists in Python and then perform data analysis with surprising consistency and high accuracy, However when you do not prompt it to answer the question in this manner the results it generates seem to be inconsistent and unreliable.

  • @XOPOIIIO
    @XOPOIIIO 3 місяці тому +1

    You can scale to the dead end as long as you wish, but the real progress is impossible without new algorithmic breakthroughs. We didn't see nothing like transformers since 2017.

  • @HorizonIn-Finite
    @HorizonIn-Finite 3 місяці тому +2

    7:35
    Actually the models pass, if you tell them it’s not a trick question or riddle. And yes I tested the shortened framer riddle on a coworker and they started thinking of the whole riddle.
    The moment I said it’s not a trick, they said, “once, right?”

  • @sentel140
    @sentel140 3 місяці тому +1

    will the video models be used for something else other than scams, stock footage, and pornography?

  • @awakstein
    @awakstein 3 місяці тому +3

    This is the Most underrated channel in UA-cam! I use Kling or Klin and is superb, the only issue is that only generates 5 seconds

  • @prolamer7
    @prolamer7 3 місяці тому +2

    Hype is real problem, but Ai is not near its limits. All models what we are using are designed for CONSUMER LEVERL HARDWARE. Including GPT4. Yes Nvidia DGX is expensive but it is still pro-sumer grade hardware. Imagine models designed for real supercomputers with 100x more parametrs and beyond. Iam sure at that moment scaling laws will kick in again... .

    • @MrSchweppes
      @MrSchweppes 3 місяці тому +1

      GPT-4 is designed to work in the cloud. Today no consumer level hardware can run GPT-4. I 100%. agree with you regarding scaling laws.

    • @prolamer7
      @prolamer7 3 місяці тому

      @@MrSchweppes H100 DGX system can runn it if we speak about Turbo or 4o version.

    • @MrSchweppes
      @MrSchweppes 3 місяці тому

      @@prolamer7 Do you consider H100 system a consumer level hardware? :) It costs several hundred thousand dollars.

    • @prolamer7
      @prolamer7 2 місяці тому

      @@MrSchweppes If you have money you can put it on your table so it is for "pro-sumer" but you cant exactly build datacenter at your home :) but I get you!

  • @techgiantt
    @techgiantt 3 місяці тому +1

    When Llama 3 came out I thought some of these people would understand that scaling is not the solution 🤦🏼‍♂ Most of them also mistake knowledge with reasoning and application of knowledge. There's obviously a reason why even though computers are more reliable and consistent than humans, these LLMs find it difficult to be consistent when exposed to large data (e.g the Claude example you showed).

  • @elitegamer3693
    @elitegamer3693 3 місяці тому +2

    I think current models are quite intelligent but not undergrad level as they become incoherent within small time and lots of hallucinations. We need more like big architecture and algorithm breakthroughs to solve current roadblocks than raw scaling.

    • @aisle_of_view
      @aisle_of_view 3 місяці тому +1

      I know a lot of undergrads who hallucinated once or twice. A lot.

  • @absiddi.7712
    @absiddi.7712 3 місяці тому +8

    The LLM world is long overdue for a shift away from transformers towards an architecture similar to our own. Anything else will simply create tools, not agents.

  • @davidh.65
    @davidh.65 3 місяці тому +1

    Sobering video. Less hype and more shipping please. The people have had enough of pie in the sky promises

  • @awakstein
    @awakstein 3 місяці тому +1

    Even with VPN, i cannot access Claude 3.5 in China - sucks

  • @johnnoren7244
    @johnnoren7244 3 місяці тому +1

    I hope OpenAI doesn't "align" the voice mode too much. I want my AI assistant to be a bit quirky, an AI assistant devoid of personality is very dystopian.

    • @byrnemeister2008
      @byrnemeister2008 3 місяці тому +1

      One with a quirky personality gets very tiring as well. We all know THAT person in the office.

  • @MichaelRicksAherne
    @MichaelRicksAherne 3 місяці тому +1

    I have some doubts about the reasoning advancing as fast as they predict.

  • @Slayer666th
    @Slayer666th 3 місяці тому +1

    I wonder if anyone has investigated how the progress of those AIs is if you combine them.
    Like give Claude, GPT, and Llama the same task, after that lets them check each others answer for errors and combine those results to get the most correct one.
    That step alone would probably decrease the errors a ton.
    Dont know what research says about it tho.

    • @aiexplained-official
      @aiexplained-official  3 місяці тому +1

      This is SmartGPT 2.0, which I put on Patreon! Works nicely for many tasks

  • @ReflectionOcean
    @ReflectionOcean 3 місяці тому +2

    By "YouSum Live"
    00:00:00 Advancements in AI video generation
    00:01:20 AI models trained on minimal video data
    00:02:30 Scale's impact on AI model accuracy
    00:03:32 Challenges in AI reasoning and understanding
    00:09:04 Future potential of AI models
    00:13:10 Caution against blind trust in AI advancements
    00:16:31 Uncertainty in AI's future capabilities
    By "YouSum Live"

  • @davidpark761
    @davidpark761 2 місяці тому +1

    i cannot eat, sleep, or breathe until you drop another video
    PLEASE!!!!!!!!!!!!!!!!!! I NEED MORE!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

  • @jonnyspratt3098
    @jonnyspratt3098 3 місяці тому +9

    "Agentic models won't be possible until they are 2 orders of magnitude greater" :D
    "... so another 2 years" D:

  • @thewebstylist
    @thewebstylist 3 місяці тому +1

    My video aren’t coming out near the quality they showcase

  • @7TheWhiteWolf
    @7TheWhiteWolf 3 місяці тому +6

    Regardless of what happens, I’m excited for whatever is coming next!

  • @triplea657aaa
    @triplea657aaa 3 місяці тому +27

    Bill Gates' statement on metacognition is EXACTLY what I've been trying to articulate with regard to how we need to proceed.

    • @11Petrichor
      @11Petrichor 3 місяці тому +6

      By the time we solve this (he was saying by 2027, or soon thereafter), will it be AGI or ASI? Crazy times!

    • @julkiewicz
      @julkiewicz 3 місяці тому

      @@11Petrichor People were saying similar things after Deep Blue won against Kasparov. It's really hard to predict scientific breakthroughs. And right now we need a major scientific breakthrough which hasn't happened for a couple of years now.

    • @Interpause
      @Interpause 3 місяці тому

      same thing i wrote in my shower thought notes about data that is missing from the internet... must be a really obvious idea but still gonna pitch it to my prof anyways

    • @LiveType
      @LiveType 3 місяці тому +2

      Yep, that was my intuitive answer for how to make the model coherent when I played around with GPT3 when that released. Essentially Q* style reasoning/prompting. It was far too inconsistent. Fine tuning was MUCH more effective.
      However, there are 2 fundamental issues that need to be solved.
      1. The n^2 scaling of current models needs to be reduced to nlogn or even n.
      2. Inability to modify weights "in real time". Not sure if this is actually an issue as it could potentially already be solved with monstrously long context windows.

    • @netscrooge
      @netscrooge 3 місяці тому

      Gates said something reasonable --- okay --- but when did he become an expert on AI? Have I missed something?

  • @tautalogical
    @tautalogical 3 місяці тому +1

    I don't think it's over hyped. If you have kids you will know that reason is something that emerges over time and quite slowly. The world is screaming at us, through so many different signals that intelligence is a natural product of certain kinds of system at scale. There might be a few missing tricks, but the bitter lesson is correct, we can blast through without those tricks.

  • @Josh-ks7co
    @Josh-ks7co 3 місяці тому +1

    Predictably better and not as good as other avenues are not contradicting statments. I wonder if GPT would get that logical inconsistency wrong.

  • @spacexfanclub6529
    @spacexfanclub6529 2 місяці тому +2

    I have been watching , learning , sometimes in horror & sometimes in amazement , all of the videos you upload on this channel & just want to say this that -You are a wonderful human being doing extremely helpful work for the human kind by cutting through all the clutter & hype & delivering truly authentic one stop solution place for non AI people to keep genuine track of reality as AI continues to advance. Much Much love from India !!

  • @dr-maybe
    @dr-maybe 3 місяці тому +2

    17:25 loved how Dario interrupts you talking about how these companies are pressing ahead even though shit is insanely dangerous

  • @jasongrig
    @jasongrig 3 місяці тому +1

    we need detailed discussion on how rigged the benchmarks are

  • @chrisanderson7820
    @chrisanderson7820 3 місяці тому +12

    I find AI as a whole concept to be highly variable, I've seen it do stuff that is literally at PhD level and of massive use to people and businesses, then act like a mushroom. Sadly its hard to tell when it's going to be one or the other. Maybe in the short term we need to identify exacting and narrow use cases where we KNOW it will act like a PhD and not try to just use it for everything.

    • @anonymes2884
      @anonymes2884 3 місяці тому +5

      Exactly. If it could _reliably_ deliver even "smart undergrad" level performance that would be a major advance IMO. Right now it vacillates between post-doc and nursery school.

    • @mikebarnacle1469
      @mikebarnacle1469 3 місяці тому +1

      These analogies don't really make sense. You could make the exact same observation about a calculator from the 80s. Those also really alternated because PhD and nursery school level intelligence. Because that's what a specialized tool looks like. But the calculator isn't anywhere close to being more than a calculator. The only difference now is because we don't know how learned networks work which should be even less confidence inspiring but because humans love to believe in magic they see it the other way and overestimate trajectory. Scaling improves capabilities because there is more memorization. Humans don't work that way, we don't just memorize more and get smarter. Memory and intelligence are separate things for us.

    • @chrisanderson7820
      @chrisanderson7820 3 місяці тому +1

      @@mikebarnacle1469 Your syntax is a bit scrambled so I am not entirely sure of the direction you are going. When I say PhD I mean asking the LLM to diagnose complex, rare medical problems by asking you, the user, questions to formulate its diagnosis, or passing the bar exam, a calculator from the 80s cannot do medical diagnoses or pass human exams. Yet at the same time it tells you to use glue on your pizza. It shows LLMs have massive knowledge model pattern recognition skills and zero common sense, they aren't sufficiently self-examining or completely referential in order to give reliable answers.

    • @mikebarnacle1469
      @mikebarnacle1469 3 місяці тому

      ​@@chrisanderson7820 The point is that calculators from the 80s can do basic arithmetic, faster, and more accurately, than any PhD mathematician. Yet they perform as well or below nursery school kids when tasked with writing a formal proof. All specialized tools have this property, and it means nothing, this is to show that the original analogy between human education levels and LLM capabilities is a meaningless observation, and not surprising, it's expected for a specialized tool. It's only if you drink kool-aid that you are surprised when the clever hans and eliza perception biases fail to meet practical real world expectations.

  • @novantha1
    @novantha1 3 місяці тому +1

    I can't shake this sneaking suspicion that we're overdue some form of paradigm shift which will be deceptively simple once unlocked, very analogous to how something like a Transformer architecture feels self evident nowadays.
    I think there's basically three areas it could happen:
    Autoregression. Current models autoregressively predict tokens, but that's not really how people work. We can non vocally and implicitly reason about things before answering, and produce a "world simulation", getting feedback from that simulation before answering. There is kind of a "layer" between the tokens we predict and the answer we give. Some people have tried to bridge that with things like scratchpads, which helped, but I wonder if there's not a more fundamental shift there. Perhaps some sort of latent or implicit linear regression which processes data before doing the autoregression. Or, maybe it's something simpler. Maybe instead of predicting the "next" token, we just need an output token embedding that lets the model choose where to put the token, or when to overwrite an existing token.
    Training dynamics. We still use gradient descent to this day, but it heavily limits the architectures we can train, and the ways we can train them. Something like a recurrent Transformer, or an architecture which is to a Boltzmann network what Transformers were to a feedforward network, or a spiking neural network, or something to that effect might be part of the answer. It might be that there's a training dynamic which allows a model to backpropogate its insights from inference rather than the inferred tokens (for instance, the ability to produce a long chain of reasoning and then to backpropogate the insight gained from that chain, rather than the chain itself), or perhaps we need something more simple; it could just be that we need a raw model and an adapter model. The raw model processes information as a raw completion engine, similar to a base model (non-instruct), with an adapter which converts that reasoning to instruction following, and the model does continual learning by adjusting weights of the completions component, while only high quality instruction following data goes to the instruct component. I'm not sure, but I think something that allows a model to reason about an undefined or open ended problem and backpropogate that information could be the key we needed. Imagine being able to tell a model "solve this math problem" without necessarily having the answer ahead of time. It would heavily change the way we could train models.
    Data patterns. I'm still not totally sold on this because I don't completely understand it, but I think the authors of "Human-like systematic generalization through a meta-learning neural network" were onto something, but I don't know exactly what they're on to. Regardless, I think the slough of papers on grokking and the one I listed here note something very interesting when taken as a package; LLMs can reason and generalize, but just feeding it more data from the internet doesn't produce full generalization of all concepts contained within it, and we might require different types of (presumably synthetic) data, which are easiest to predict with generalization over memorization. I don't claim to fully understand the mechanisms involved or the shape the data would take, but I think this is not an unreasonable supposition.

  • @wisdomking8305
    @wisdomking8305 3 місяці тому +2

    Less than 4 hours ago, Philip released an AI course and yes, I have watched all the modules in full, completed all the 9 quizzes and have already tested the model in a dozen way.

  • @bambabam1234
    @bambabam1234 3 місяці тому +1

    What do you think about the recent MatMul paper for computing LLMs without Matrix Multiplications?

  • @MatthewKelley-mq4ce
    @MatthewKelley-mq4ce 3 місяці тому +4

    Sonnet is helping me quite a lot.

  • @jamesyoungerdds7901
    @jamesyoungerdds7901 3 місяці тому +2

    Another gem, thanks Philip! Couldn’t help but think - couldn’t an agentic flow or strategy boost things like you math problem results? Layering in checkers, supervisors, verifiers, etc before the output, that might have really levelled up your example?

  • @Oliver_w8
    @Oliver_w8 3 місяці тому +25

    I've always thought it was nonsense to try to say that one of these models is like a "smart highschooler" or an "undergraduate" because the models always have sophisticated databases containing very advanced material-for instance, you could even ask GPT-3 questions about PDEs or measure theory and it would reproduce accurate definitions from textbooks that are presumably in its training data. But the basic reasoning hallucinations, sheer amount of data required, and slow progression indicate very clearly that the 'mental faculty' of an LLM is so dissimilar to that of a human there is a certain sense in which a newborn child is leagues "smarter" than even the most advanced models.

    • @vaevictis3612
      @vaevictis3612 3 місяці тому +3

      The problem about this approach is that we *do not know* how the human brain "stores" and "reproduces" data. GPT-3 or whatever, does not simply enter some textbook database and copies text from there. The data is not simply "stored" inside the model even if it can almost seemingly reproduce it word to word. When the human expert remembers something, it also does not simply "read" from the imaginary textbook cheatsheet. In this regard, human brain is simply lagging behind the modern LLMs in terms of ability to store
      eproduce data.
      However I do agree that the full mental cognitive algorithm (whatever it is in either human brain and a silicon analogue) in the current iteration of LLMs is far from ideal. Consider, that when the human expert tries to remember something, they can see that there are different possible answers to the question. Human then tries to carefully traverse through these possible answers, to choose one that is most self-consistent and that contains the most amount of possible options. But the LLMs try to shortcut through the shortest possible answer that could be perceived as correct. As Andrej Karpathy once said (I think?), LLMs are not as much knowledge machines as imagination machines. They are more "concerned" about the act of answering than of the answer itself.

    • @GodbornNoven
      @GodbornNoven 3 місяці тому

      There is no formal definition for intelligence. A new born toddler is not smarter than the SOTA LLMs.

    • @SeventhSolar
      @SeventhSolar 3 місяці тому

      I find hallucinations at least to be very human. Not only are hallucinations of a symptom of many mental injuries and diseases, healthy people hallucinate all the time. It's well-known that memories can be completely fabricated by your own brain to fill in gaps.

    • @andersberg756
      @andersberg756 3 місяці тому

      Yeah we shouldn't try to understand llm:s so much in human terms, but by their strengths and weaknesses on its own. Ppl err a lot with this, getting disappointed: "how can it write so we'll but still lie"?
      Like learning about dogs in order to use them as tools, eg in police work.

    • @mgscheue
      @mgscheue 3 місяці тому

      Agreed. It’s not a meaningful measure. Francois Chollett discusses this in detail on Sean Carroll’s Mindscape podcast.

  • @lpls
    @lpls 2 місяці тому +2

    I love how you put all the references on the description.

  • @KayButtonJay
    @KayButtonJay 3 місяці тому +14

    The limitations are always going to be the training set. Anything not in training will not be queryable post-training. Additionally, the transformer / attention architecture will not be capable of AGI-like reasoning. It requires better architectures

  • @smittywerbenjagermanjensenson
    @smittywerbenjagermanjensenson 3 місяці тому +1

    Over hyped in the short term under hyped in the long-term

  • @michaelwoodby5261
    @michaelwoodby5261 3 місяці тому +1

    I suspect it's a bit of scaling but the huge increases in efficiency are going to make it reliable.
    If you've got a model that's right 92% of the time, but runs at a quarter the cost, you can just run it a couple times and see if each answer is in agreement. If it decides there's an issue, it can then do that a couple more times until one answer gets a majority of the votes.
    A simple request may only have to run twice, a complex one many times, but it could scale automatically. It could also be scaled manually from the customer (how much would you like to spend on quality checks?) without new tech, beyond an editor model or a think step by step logic model where needed.

  • @rowanmoore284
    @rowanmoore284 2 місяці тому +1

    Looking forward for the next video, a lots happened over the last few weeks