A 100T Transformer Model Coming? Plus ByteDance Saga and the Mixtral Price Drop

Поділитися
Вставка
  • Опубліковано 17 гру 2023
  • With the GPT 4.5 rumours unceremoniously crushed, we are thankfully left with the real news. A potential 100T LLM, thanks to etched.ai (full breakdown of what we know), then the Mixtral price spiral revealing the decline in the cost of intelligence, a cameo appearance by Sebastien Bubeck to discuss phi-2 and a potential 13B phi-3, plus Bytedance meltdown and more.
    AI Insiders: / aiexplained
    Roon: / 1736559508401615186
    Altman: sama/with_replies...
    Will Depue: / 1736478901717680582
    Etched.AI: www.etched.ai/
    Article: www.eetimes.com/harvard-dropo...
    Mixtral: mistral.ai/news/mixtral-of-ex...
    OpenRouter: openrouter.ai/models/mistrala...
    Midjourney v6: / 1736141178855424097
    ByteDance: the-decoder.com/openai-bans-t...
    Bubeck Tweet: / 1736166454293307465
    Quanquan Gu: / 1732484036160012798
    / 1730809526004408617
    Magnific: magnific.ai/editor/
    AI Insiders: / aiexplained Non-Hype, Free Newsletter: signaltonoise.beehiiv.com/
  • Наука та технологія

КОМЕНТАРІ • 408

  • @TheYoxiz
    @TheYoxiz 4 місяці тому +126

    hello!!
    i'm just some dude who's moderately interested in AI stuff despite working in a completely different industry, and i just want to comment for once and thank you for making such high quality videos. no bullshit, straight to the point, no call-to-action every five minutes, no animated "SUBSCRIBE! HIT THE BELL!" motion graphics overlaid on top of the video for no reason, no unnecessary buzzwords or disingenuous "reactions", and no unnecessary hype.
    the youtube market is over-saturated with garbage like i described above and... i mean the bar is so low, and i'm just glad to see someone who actually *reads* the papers and summarises them the way you do. you're always on point and your speculation/opinions are always insightful. it's greatly appreciated.

    • @aiexplained-official
      @aiexplained-official  4 місяці тому +46

      Means a lot Yoxiz. Not doing any of the above is penalised heavily in the algorithm but made up for by people like you!

    • @mgscheue
      @mgscheue 4 місяці тому +7

      Well-said. I very much agree with all of that.

    • @TheYoxiz
      @TheYoxiz 4 місяці тому +11

      @@aiexplained-official it's a good thing for long term content because it means you're building up an actual audience of people who care, instead of people who mindlessly click when asked to!

    • @overnightparking
      @overnightparking 4 місяці тому +4

      I'm another "just some dude" earlier only moderately interested in AI who got hooked on your high quality channel and subsequently subscribed to AI Insiders because your work is so compelling. Hopefully others will too and you'll benefit more than those who play the algorithm game.

    • @aiexplained-official
      @aiexplained-official  4 місяці тому +4

      Thanks overnight, because of people like you the channel can make it

  • @Theonlyrealcornpop
    @Theonlyrealcornpop 4 місяці тому +295

    Just goes to show: people are absolutely frothing for another jump in capability, which is so insane when you think about how far we've come in just one year

    • @youdontneedmyrealname
      @youdontneedmyrealname 4 місяці тому +28

      Kylo Ren: "MOOOORREE!!!!!"

    • @KyriosHeptagrammaton
      @KyriosHeptagrammaton 4 місяці тому +8

      Which goes to show that at least so far, as fast as AI can adapt and learn, humans are faster.

    • @mb7626
      @mb7626 4 місяці тому +37

      Well, human desire or appetite is faster, which imo was never in question, our nature is one of chronic dissatisfaction :p

    • @bossgd100
      @bossgd100 4 місяці тому +2

      it was down only since gpt-4 from march but ok

    • @ecoista1373
      @ecoista1373 4 місяці тому +4

      @@KyriosHeptagrammaton for now

  • @pokerandphilosophy8328
    @pokerandphilosophy8328 4 місяці тому +32

    The GPT-4,5 rumor is an instructive case of a double hallucination. The model itself hallucinated that it was named "GPT-4.5" and as a result the community hallucinated that GPT-4 had significantly improved overnight. This is rather similar to what happens whenever someone posts on Reddit that GPT-4 has just been nerfed according to them (even in cases where they are using the exact same legacy model) and dozens of posters concur. Confirmation bias is a powerful psychological effect.

    • @ArSm-ge2qx
      @ArSm-ge2qx 4 місяці тому +3

      Great comment!
      Yeah, well-known the Baader-Meinhof phenomenon. It is possible because of the model's temperature

    • @luiginotcool
      @luiginotcool 4 місяці тому +2

      I also find it weird that people who are interested in AI think that the model has any kind of intrinsic info about itself. GPT-4 only knows it’s GPT-4 because we told it in the system message

    • @ArSm-ge2qx
      @ArSm-ge2qx 4 місяці тому

      @luiginotcool Model definitely knows it is ChatGPT trained by OpenAI. GPT got this response during RLHF. So yes, about certain model' name (like "I'm gpt4-611-preview") it can easily hallucinate

    • @abenjamin13
      @abenjamin13 4 місяці тому +1

      Agreed 👍

    • @clray123
      @clray123 4 місяці тому

      @@luiginotcoolI suspect this gullibility is partly because the model's daddy (Sutskever) is broadcasting a magical thinking of his own. According to him (an unproved claim whatsoever) from accurately predicting/generating human-like text it follows that the entity which possesses this ability understands/becomes human-level (despite all the evidence we have against it, like GPT tripping over its virtual shoelaces in so many ways). Somehow to a brilliant mind like Sutskever's it has not appeared that an amazingly good fake is still a fake, and to become X you must pass ALL tests related to being X, not just some of them. Or maybe he's full-well aware and just marketing to the less mentally capable.

  • @SirQuantization
    @SirQuantization 4 місяці тому +43

    Been waiting for this! Everyone out here believing the 4.5 news meanwhile I’m just waiting for AI Explained video 😂

    • @aiexplained-official
      @aiexplained-official  4 місяці тому +8

      Haha thanks Fourth, or shall I say SirQuant? Those unfounded rumours bugged me, as you can tell.

    • @SirQuantization
      @SirQuantization 4 місяці тому

      @@aiexplained-official 😂 fourth is good not sure why it changed me back to sirquant
      And yeah I knew the truth would reveal itself through one of your videos; as usual haha

  • @Rkcuddles
    @Rkcuddles 4 місяці тому +74

    Thanks as always for taking the time to talk to us non-scientists about AI news and keeping the hype train in check.

    • @jonatand2045
      @jonatand2045 4 місяці тому +1

      The coverage is good, but what is missing is bit of happening in the background. For example Australia wants to build a brain scale supercomputer based on neuromorphic chips.

    • @dylancope
      @dylancope 4 місяці тому +4

      I'm an AI researcher (PhD student) and this is still my best way to stay up to date with the news. Everything moves so fast at the moment!

    • @aiexplained-official
      @aiexplained-official  4 місяці тому +3

      Glad to hear dylan!

    • @GlenMcNiel
      @GlenMcNiel 4 місяці тому

      Same here (except for the PhD education) @@dylancope

  • @Y3llowMustang
    @Y3llowMustang 4 місяці тому +10

    Twitter humans hallucinating ChatGPT improvements is fantastically ironic

  • @bulltheknicksfan3140
    @bulltheknicksfan3140 4 місяці тому +56

    Thank you for reporting real news and not reporting obvious hallucinations as news. Easily the top ai channel on UA-cam

  • @Lishtenbird
    @Lishtenbird 4 місяці тому +8

    Out of context, "transformers aren't changing" is a funny phrase.

    • @KBRoller
      @KBRoller 3 місяці тому +1

      Optimus Prime doesn't look a day over 7 million years old! 😁

  • @alertbri
    @alertbri 4 місяці тому +36

    Good to know that it's us 'hallucinatong' and to expect a much bigger jump in performance when they push it out! 😁

  • @matteogirelli1023
    @matteogirelli1023 4 місяці тому +9

    Venture capital firms be like:
    Harvard graduates 🤢
    Harvard dropouts 😏

  • @JohnVance
    @JohnVance 4 місяці тому +20

    Your style and content continues to exceed every other AI UA-camr out there. Thanks for shielding us from the AI hype bros. 😀

  • @MihikChaudhari
    @MihikChaudhari 4 місяці тому +6

    Kinda surprised you didn't mention Deepmind's Funsearch breakthrough, which discovered new solutions for the cap set problem. They also demonstrated it's generality by using it to discover a more efficient algorithm for bin packing. It seems to be using something similar to what you described in the Q* video

  • @ominoussage
    @ominoussage 4 місяці тому +3

    Ngl, I thought I was used to the rate of progress AI is getting, but this video proved me wrong. That transformer supercomputer should be at least 4-5 years ahead but here we are now.

  • @teleprint-me
    @teleprint-me 4 місяці тому +1

    I've been busy, but the irony of a set of businesses that took intellectual property without permission, built a technology on top it, and then stated you can't do that to us is not lost.

  • @Fliptricksftwdude
    @Fliptricksftwdude 4 місяці тому +3

    Impressive photos at the end which makes you even more skeptical about anything you see online lol. Kinda feels like the internet will become a more and more artificial place and the more this process goes on the more I will be spending time in real life with real people I can interact with and who I know are real xD

  • @Citrusfemboy
    @Citrusfemboy 4 місяці тому +8

    I follow the singularity subreddit (mainly just for news about topics related to AI and medicine) and recently they’ve definitely developed a conspiratorial mindset in relation to these models. It’s really strange.

  • @drzhi
    @drzhi 4 місяці тому +17

    ❤️ Amazing content with valuable takeaways:
    00:00 GPT-4.5 rumors were denied by OpenAI employees.
    02:12 A new company claims to have developed a 100T Transformer supercomputer.
    06:46 The Mixtral model matches or beats GPT-3.5 in benchmarks and is significantly cheaper.
    .

  • @KP-sg9fm
    @KP-sg9fm 4 місяці тому +3

    SEBASTION?! Insane guest, he is super underrated on the pop culture side, I still remember his unicorn presentation

  • @supernenechi
    @supernenechi 4 місяці тому +15

    Thank you for talking about Mixtral! I'm super excited about the tech of MoE and open source community work is this very second being done to learn how to better fine-tune these models openly and for our own purposes.
    I am expecting Huggingface to explode any day now 😂
    Open source has officially beaten what we judged last year to be "world-changing" (gpt3.5). Now let's see if we can get GPT-4 levels by the end of next year

  • @sepptrutsch
    @sepptrutsch 4 місяці тому +2

    I like that you keep it grounded and dont hype stuff up. The other channels hyping everything up will get boring quickly. There is enough happening in reality, no need to make things up.

  • @mattmaas5790
    @mattmaas5790 4 місяці тому +3

    This is the chip they were battling cyberdyne for in Terminator 2

  • @kunstrikerasochi2103
    @kunstrikerasochi2103 4 місяці тому +4

    Awesome summary as always, Thankyou I hope you enjoy your festivities

  • @GrindThisGame
    @GrindThisGame 4 місяці тому +3

    Love your content. I hope to become a patron soon but until then.

    • @aiexplained-official
      @aiexplained-official  4 місяці тому

      Thanks so much Grind. Even this dono is enough, thank you

    • @GrindThisGame
      @GrindThisGame 4 місяці тому +1

      You're welcome. Remember to take breaks so you don't burn out. We can't have that!@@aiexplained-official

  • @David-tp7sr
    @David-tp7sr 4 місяці тому

    Thanks, you'r awesome AI Expalined.

  • @wildboar0636
    @wildboar0636 4 місяці тому

    I really appreciate you spending your time putting together these informative videos. It seriously means a lot to me, and no doubt a lot of others, that you refrain from all the brain-numbing clickbait and overhyped nonsense that most channels lower themselves to. Please keep up the great work, but don’t be afraid to take a break if you ever need it man. ❤

  • @penguinista
    @penguinista 4 місяці тому +3

    Interesting that the QR code on the woman's phone in the image at 11:40 looks scrambled in a way similar to the way the text on the shelf tags is scrambled.

  • @Ecthelion3918
    @Ecthelion3918 4 місяці тому +2

    Can always count on you for quality information, great video as always

  • @stephenrodwell
    @stephenrodwell 4 місяці тому +1

    Thanks! Amazing content, as always. 🙏🏼

  • @Xilefx7
    @Xilefx7 4 місяці тому +2

    Good video like always. Happy week and Merry Christmas from now

    • @aiexplained-official
      @aiexplained-official  4 місяці тому +1

      Thank you Xile!

    • @Xilefx7
      @Xilefx7 4 місяці тому

      @@aiexplained-official My name is Felix but you're welcome

    • @Xilefx7
      @Xilefx7 4 місяці тому

      Are you going to make a video about Mamba?

  • @martinpercy5908
    @martinpercy5908 4 місяці тому +1

    Great video, thanks Philip, commenting for the algorithm

  • @LevelofClarity
    @LevelofClarity 4 місяці тому +1

    Great video as always. The last part about Midjourney V6 at 11:56 is surprising, though. It looks great, almost photorealistic... except for the hands. It still cannot do hands. The lady holding the phone only has four fingers 🙄

  • @faridabbasi1196
    @faridabbasi1196 4 місяці тому +1

    happy new year

  • @micbab-vg2mu
    @micbab-vg2mu 4 місяці тому +1

    Thank you for the update:)

  • @jiucki
    @jiucki 4 місяці тому +2

    As always, I loved your video.
    Do you know something about Google training Gemini 2?

  • @Pizzarrow
    @Pizzarrow 4 місяці тому +4

    Thank you for waiting an extra 24 / 48 hours and reporting the actual facts..

  • @garronfish8227
    @garronfish8227 4 місяці тому +2

    I like all the pricing labels in the women with the lemon image. Wish my supermarkwt was more like that

  • @MimOzanTamamogullar
    @MimOzanTamamogullar 4 місяці тому +2

    If these new chip designs are actually *that* efficient for models, I assume we are only a few months from feeling the AGI. Think about it. The main reason we don't have self driving cars is that self driving car AI is too narrow. There are just millions of different situations you can experience on the road, these ANIs simply aren't reliable enough.
    But if you can get a 40b parameter vision enabled model in that car, I'm confident it'd be reliable enough to actually run taxis with. Or logistics trucks with. Or military drones with.
    Think about it. What actually stops a transformer model from being customized to operate a drone? Transformers are too slow and expensive to run. But if we actually solved that issue, I assume we'll start hearing about autonomous military drones in a few months.
    Or maybe this whole thing is a scam, who knows 🤷🏻‍♂️

  • @Ikbeneengeit
    @Ikbeneengeit 4 місяці тому +6

    Wild prediction: two undergrads won't somehow make a better chip than Nvidia's core business.

    • @GrindThisGame
      @GrindThisGame 4 місяці тому

      I agree but there was Google....

    • @skierpage
      @skierpage 4 місяці тому

      Agreed, Nvidia would have to be asleep at the switch, suffering intense NIH hubris, and addicted to selling thousand dollar gpus not to be exploring this itself; similar for Google, who already make custom TPU chips. And both have in-house dedicated AIs that optimize chip design. The market is still big enough for custom silicon to get a profitable slice in some niche.

  • @lomotil3370
    @lomotil3370 4 місяці тому +2

    🎯 Key Takeaways for quick navigation:
    00:00 🚀 *GPT-4.5 rumors have been denied by multiple OpenAI employees, quashing the speculation around its existence.*
    02:22 💻 *A new company claims to have developed the world's first Transformer supercomputer, optimized for large language models, promising significant improvements in performance and efficiency compared to GPUs.*
    06:46 💰 *Mixture's Mixtrial, an 87 billion parameter model, beats GPT-3.5 in benchmarks and Gemini Pro, offering competitive performance at a fraction of the price, with recent price drops driven by intense market competition.*
    08:19 🧠 *Sebastian BBC, a lead author of Sparks of AGI, sees potential for achieving GPT-4 level reasoning at 13 billion parameters, emphasizing a scientific quest for minimal ingredients needed for advanced AI intelligence.*
    10:09 🕵️ *ByteDance, a multi-billion dollar company, is reportedly using OpenAI tech in violation of terms, leading to a ban from OpenAI's ChatGPT due to possible data theft, highlighting the challenges and violations in the competitive generative AI landscape.*
    Made with HARPA AI

  • @Shaunmcdonogh-shaunsurfing
    @Shaunmcdonogh-shaunsurfing 4 місяці тому +1

    Etched makes a lot of sense. Spoke with a family member recently on the topic of how this could be achieved. It def takes domain knowledge.

  • @ElijahTheProfit1
    @ElijahTheProfit1 4 місяці тому +1

    Another great video. Thank you Philip!

  • @David_Box
    @David_Box 4 місяці тому +2

    Incredible, I can't wait for project Seed to end up being an AGI level LLM (despite being trained on gpt-3.5)

  • @_sky_3123
    @_sky_3123 4 місяці тому +4

    Guys, we are talking about what might be in 2024, but just imagine where we will be by 2030-2035?

  • @grayboywilliams
    @grayboywilliams 4 місяці тому +1

    Curious how flexible their chip is with minor variants to the classic transformer. Gemini for example uses the single write head variant. There was also a recent paper published that parallelized the attention and feed forward layers. There is still a lot of research happening to improve the architecture and I personally don’t think it’s perfect yet.

  • @lamsmiley1944
    @lamsmiley1944 4 місяці тому +2

    That MidJourney image still suffers from AI not understanding how humans hold objects.

    • @ShawnFumo
      @ShawnFumo 4 місяці тому

      I wouldn’t make final judgements on it yet since they still need to finish the fine-tuning of it. These pics are from a “rating party” going on the last couple of days where you choose the better picture out of two, to help them improve it. For v5, there was a pretty significant leap in quality when the final version came out.

    • @lamsmiley1944
      @lamsmiley1944 4 місяці тому

      @@ShawnFumo we shall see. I’ve been rating a lot of images, it appears to be significantly better at photo realism and certain art styles. But it looks like it’ll still be behind Dalle-3 on text

  • @InnerCirkel
    @InnerCirkel 4 місяці тому +1

    Thanks again for all the good stuff! Even when a new breakthrough is made, like the transformer, a designated hardware architecture for the transformer could well still be an amazing accelerator as it could work in a modular setup with a new (software) breakthrough on a different chip. Just my layman's opinion.

  • @GlenMcNiel
    @GlenMcNiel 4 місяці тому +2

    By the way, I still maintain that we should replace the term "hallucinations" with something more accurate, like "confabulations" or something.

  • @winsomehax
    @winsomehax 4 місяці тому +2

    Once an algorithm has stabilised, someone implements it in hardware. Consider it this way: algorithms have instructions, each instructions has a fetch, decode and then execute.
    Putting an algorithm, in hardware means cutting out the fetch/decode - making it quicker. The problem is... you sacrifice flexibility for performance. if someone comes up with a better algorithm, your hardware is wasted. You can't just update your hardware like you could change the programming.
    Basically, someone was always going to do this and swallow the risk by taking this step. I tend to think it's way too early and they will get wiped out, but good luck. I kind of expected it hardware implementations to pour out of China first - and I still think we'll see a flood of those soon.

  • @NorbertKasko
    @NorbertKasko 2 місяці тому +1

    I just used GPT 4 today and it's not as smart as they like to claim. It made mistakes what a human wouldn't do and I had to "educate" it about counting and simple calculations. I asked it about the hyperfactorial function and asked it to count plates on a picture it made mistakes in both cases. I had to correct the AI which supposed to be smarter than me. After making some explanations it apologised and then it still spit up wrong values. Definitely there is room for improvement.

  • @7secb
    @7secb 4 місяці тому +2

    With the transformers in hardware, will they be able to build in mechanistic interpretability?

  • @user-pf9jv1fl2n
    @user-pf9jv1fl2n 4 місяці тому +3

    Man 2024 is going to be a crazy year. Keep the videos coming :)

    • @hydrohasspoken6227
      @hydrohasspoken6227 4 місяці тому

      How do you know?

    • @ShawnFumo
      @ShawnFumo 4 місяці тому

      @@hydrohasspoken6227I don’t see how it couldn’t be. Things don’t seem to be slowing down and if anything keep speeding up. Not just LLMs but image generation, video generation, speech generation, 3d generation, robotics. There seems to be 5-10 interesting research papers released each day now.

    • @hydrohasspoken6227
      @hydrohasspoken6227 4 місяці тому

      @@ShawnFumo , the faster it goes, the sooner a plateau will hit.

  • @Nick-bq1ez
    @Nick-bq1ez 4 місяці тому +1

    Excellent video as always

  • @H1kari_1
    @H1kari_1 4 місяці тому +4

    I am surprised no on really thought about a transformer specific ASIC. Neat idea, since it really feels as if there isn't anything better than raw transformers. Since they are around for pretty long now and produce great results, I think any ASIC is good if price/value is good.

    • @productjoe4069
      @productjoe4069 4 місяці тому

      Committing to transformers for inference means committing to at least O(n^2) time complexity. For me, the question is whether that feels ‘right’ for the time complexity of the problem. I’m not sure as yet, and certainly not sure it’s worth the many millions required to build and scale production for an ASIC based on that assumption. I think many others aren’t sure yet either.
      It’s worth remembering that one reason why transformers are so good is because they allow massively parallel training. The slow inference is the trade-off we make for that. Architectures like RWKV show that there are ways to have our cake and eat it (to some degree) with transformer on the training side and RNN on the inference side. But those sorts of architectures will not benefit from this system.

    • @H1kari_1
      @H1kari_1 4 місяці тому

      @@productjoe4069 No matter how I think about it but simulating the 3d architecture of the brain or even higher dimensions in the future can only apply this big O notation.

  • @jvlbme
    @jvlbme 4 місяці тому +2

    Great video, as always.
    I would think, and hope to see, another way of getting rid of those 'awkard pauses' when voice chatting, would be to have _dedicated_ models, rather than 'just' e.g. 'normal' GPT-4 Turbo. Dedicated models with memory and context driven real time learning, i.e. it learns on the fly (constantly re-evaluating) to predict not only what it learnt to predict _generally_ , but develop a sense of what is needed in that _particular_ discussion. I think we would go a long way towards more natural conversations even without special hardware that way. However such a technique in _combination_ with the much faster etched chip, might tip us over on the AGI scale, as speed might be of the essence when real time training is needed.

  • @Leto2ndAtreides
    @Leto2ndAtreides 4 місяці тому

    I've discussed the possibility of building chips for transformers with people, on the assumption that you could get significant lift in performance.
    But I don't personally have good VC connections. Cool to see that someone is doing it.
    Hope they have better mentors than companies like Theranos... More just for helping them improve their overall approach.
    Startups are tough for college dropouts.

  • @averyspeicalpresent
    @averyspeicalpresent 4 місяці тому +1

    2024 is going to be scarily good... thank you for your youtube output

  • @swdev245
    @swdev245 4 місяці тому +3

    It's like Intel and AMD in the 2010s: as long as AMD lagged behind Intel significantly, Intel dragged its feet regarding performance improvements of new CPUs. GPT-5 will probably be revealed once GPT-4 will finally have been matched or surpassed by a competitor.

  • @BionicAnimations
    @BionicAnimations 4 місяці тому +1

    I am not saying it's 4.5 because, honestly, no one knows what it is (except OpenAI), but I have noticed over the past 48 hours it's much faster and smarter than I have ever seen it. Also, there are many top AI people saying it is 4.5 stealth. There is no denying something has changed over the past 48 hours.

  • @stackoverflow8260
    @stackoverflow8260 4 місяці тому

    What if we develop a hardware that does some type of stochastic regularization on the weights.. attention mechanism depends on matrix multiplications followed by a softmax operation.. if you organize these hardware blocks in an obejct oriented fashion then the so called specificity for LLMs won't be an issue.. ?

  • @Swordfish42
    @Swordfish42 4 місяці тому +1

    I Fuckin' knew it!
    Specialized inference hardvare is a must, and I'm glad people with actual ability and power realize that.

    • @skierpage
      @skierpage 4 місяці тому +1

      It would only be hubris and an addiction to selling $10,000 general-purpose GPU chips that prevent Nvidia from developing specialized hardware, and I don't see how Etched can beat Nvidia's in-house ChipNeMo LLM that optimizes chip design; same for Google who have developed generations of TPUs. But more power to the 21-year-old dropouts!

    • @Swordfish42
      @Swordfish42 4 місяці тому

      @@skierpage Yeah, I'm not so sure that those dudes will make it, but I'm just glad that the idea is definitely out there. For many real world applications inference speed and cost are the crux of the problem for now. If I could get the quality of GPT-4-turbo at 100x speed of the current API and at fraction of the cost...
      Basically all creations based on multilayered model queering would get supercharged.
      And those are pretty much the easiest way of getting to human level cognitive performance within this tech branch.

  • @yannickpezeu3419
    @yannickpezeu3419 4 місяці тому +1

    Thanks

  • @JazevoAudiosurf
    @JazevoAudiosurf 4 місяці тому +14

    i think if we're being realistic about the next 3 months or so, we're gonna see it all come together: SSMs or hybrids trained on high quality synthetic data, doing the tricks from the validate step by step paper with encoding steps with rewards, larger scale because new chips arrived, better MoE. and all that for a model that can generate even better synthetic data. from a software perspective, this already feels like a small singularity

    • @zrakonthekrakon494
      @zrakonthekrakon494 4 місяці тому +1

      Hold your breath, people are incredible at slowing things down, failing to put the right tech together, forgetting things, and gate keeping things, hope for the best

    • @ZuckFukerberg
      @ZuckFukerberg 4 місяці тому

      I think you're being too optimistic, it will all come but not that soon

    • @JazevoAudiosurf
      @JazevoAudiosurf 4 місяці тому

      the reasoning stuff would be nuts if it can be replicated well. the rest i'm optimistic@@ZuckFukerberg

  • @Soosss
    @Soosss 4 місяці тому +1

    Really like this video, there’s so much overhyping everywhere else that it’s hard to find what’s true and what’s just a product of hype

  • @MrFlexNC
    @MrFlexNC 4 місяці тому

    Forget about optimized chips for llm, imagine chips that are hardwired for a specific model, this would increase speeds by a few more factors and really be powerful and efficient (cheap) to run

  • @watcherofvideoswasteroftim5788
    @watcherofvideoswasteroftim5788 4 місяці тому +1

    With all these amazing developments I can't help but think integration is what will take this to the next level. Integration with code bases, libraries, phones, smart homes and robots etc. Context is key in most parts of regular day life, I don't only want a piece of code that approximates what I want to do under the context I can provide in the prompt but the piece of code that slots in to my code base. I want code reviews and self healing code done in a way that respects the overarching purpose of the software. Liftoff won't be far of once this is achieved.

  • @alpha007org
    @alpha007org 4 місяці тому +2

    I try Chatbot Arena from time to time, and the progress we made is amazing. A lot of times, even the smaller models amaze me with their answer. Try some stupid combination*, and when you expect Claude or GPT would be the best, at least I sometimes can't pick who's better.
    *Like "Why did President Clinton wrote ?something on the board?, and if two guys sitting at the dinner table, what would they order from the McDonalds menu." A question where you can ignore the first part, because it doesn't matter. Or just some other stupid things.

  • @karenrobertsdottir4101
    @karenrobertsdottir4101 4 місяці тому +3

    Well, except Transformers *is* changing. There's lots of new architectures coming out in papers with varying degrees of relation to Transformers. And even if you stick with a relatively "pure" transformers architecture, MoEs are the hot new thing, and I really doubt their chip is optimized to MoEs.

    • @ShawnFumo
      @ShawnFumo 4 місяці тому +2

      I don’t know how much they optimized for it, but in the Etched docs shown in the video, it did specifically mention it worked with MoE

  • @jmoney4695
    @jmoney4695 4 місяці тому +1

    Love the shoutout to Jimmy Apples and Futuristic Flower - r/singularity can’t get enough of these cult-like prophets. Honestly, wish people would stay more level headed.

  • @jayaybe1
    @jayaybe1 4 місяці тому +2

    Serious question: 9:50 I understand there is a "race to win", but win what? And where is the finish line? What will happen to those that don't win?

    • @aiexplained-official
      @aiexplained-official  4 місяці тому +5

      The winner gets money. We just hope the losers aren't everyone.

  • @funnelfpv9435
    @funnelfpv9435 4 місяці тому +2

    The midjourney women may be good but when you look at the hundreds uneeded price tickets without any readable text you know it's ai generated.

  • @melissakampers
    @melissakampers 4 місяці тому +1

    'Hallucinations' is a good excuse for denial.

  • @michaelnurse9089
    @michaelnurse9089 4 місяці тому +1

    Today my GPT4 went from super slow - watching the words drip out - to super fast, two pages in a second sort of thing. It could be new servers...

  • @millenialmusings8451
    @millenialmusings8451 4 місяці тому +1

    Why do you think Sam Altman said open source cannot compete with them if we're seeing open source LLMs getting so close in capability?

  • @user-ko2nl1lg4n
    @user-ko2nl1lg4n 4 місяці тому +3

    OAI beginning to openly ban people for data theft is very funny to me.
    Midjourney v6 looks incredible. First reaction was that it was a real photo... then I saw the absolute number of lemons. The "lemons section" haha.

    • @karenrobertsdottir4101
      @karenrobertsdottir4101 4 місяці тому +1

      Nobody who's been using Stable Diffusion XL would be impressed with those images.

    • @ShawnFumo
      @ShawnFumo 4 місяці тому

      @@karenrobertsdottir4101Though we also should keep in mind these are all from the “rating party” to do the final fine-tune of the model. So the actual quality should be better than this (they specifically said don’t use these as judgement on the final results) and will default to 2k resolution.
      Seems likely they’ll manage to get above DALL-E 3, SDXL, and Firefly 2 with this release in terms of quality and prompt following. But of course SDXL still has the advantages of being open with the custom fine-tunings, LoRAs, etc.
      Edit: And sounds like text isn’t the focus of the initial release so DE3 may still be better at text to start with.

  • @GlenMcNiel
    @GlenMcNiel 4 місяці тому

    @aiexplained-official - What is that suite of questions you use to test every new model? Aside from word-based math problems, I only have a couple of other prompts that I use. It would be enormously valuable to me if you would be willing to share. 🙏

  • @MrSchweppes
    @MrSchweppes 4 місяці тому +2

    Great video as always 👍 The “transformer chip” may bring a tremendous change to the whole industry. Hope they succeed! As always, thanks a lot for another great video! PS. I wanted to ask you, do you think we’ll have GPT-5 in about 6 months or more than that?

    • @ArSm-ge2qx
      @ArSm-ge2qx 4 місяці тому

      (Personally) I'm not sure that GPT4.5 will come out ever. I (my speculation) give only 35% for it. So 65% they'll release GPT5 without GPT4.5
      Okey, _when?_ Their next GPT model (according to my speculations again) will come out in 1 week - 2,3 months, no more.

    • @MrSchweppes
      @MrSchweppes 4 місяці тому

      @@ArSm-ge2qxWhy do you think maximum 3 months? In spring Altman said they won’t train a new model for some time. It will take at the very least 3 months to train it. And they have spent 6 months red teaming GPT-4. So how in 3 months is possible?

    • @ArSm-ge2qx
      @ArSm-ge2qx 4 місяці тому +2

      @MrSchweppes 1. On 13 November, Sam Altman confirmed they had started to work on GPT5.
      2. They spent 6 months to train GPT4 when Azure hadn't been developed yet as now.
      3. GPT5 won't be a model larger than GPT4 (maybe _slightly_ larger, or the same size)
      4. Because Altman said on 13 November they started to work on GPT5, we may say they started to work (and training) in October (or even in September) as soon as the 6-months-stop-model-training-pls has been ≈ended - OpenAI accepted it then.
      5. I gave such period because I don't know how long it takes to RLHF (with their current capabilities) model like GPT5. So from a week to 3 months maximum.
      P.S. Sorry for mb bad phrasing from my side. I'm not a native speaker :p

    • @MrSchweppes
      @MrSchweppes 4 місяці тому

      @@ArSm-ge2qx Your English is fine. I'm also not a native speaker. What do you mean, when saying that GPT--5 won't be significantly larger than GPT-4? How it will be smarter, if it will be about the same size as GPT-4?

    • @ArSm-ge2qx
      @ArSm-ge2qx 4 місяці тому +1

      @MrSchweppes Better and bigger data, better architecture. Only size isn't the key to the success of the model. Phi-2 is an example.

  • @13lacle
    @13lacle 4 місяці тому

    I think the sohu chip is the right idea with building specialized hardware. However is sounds like they are still using transistors, I think the real gains will come from memristors. The main reason is that it is integrating storage and computing. So there isn't any overhead with moving memory around and it doesn't need to calculate the whole network every pass instead just when it gets a change in inputs. To make it more programable though I have seen some hybrid transistor memristor designs proposed, where the transistors hold the weights and then pass them to the memristors.

  • @rufinolarson635
    @rufinolarson635 4 місяці тому +2

    Philip bringing us the cutting edge of AI news. While everyone was hyping over 4.5 rumours, we get the interesting fact about purpose built Transformer GPUs. You heard it first here, folks!

    • @aiexplained-official
      @aiexplained-official  4 місяці тому

      Thanks rufin!

    • @rufinolarson635
      @rufinolarson635 4 місяці тому

      @@aiexplained-official Anytime. I'm hoping there are plans for consumer grade units. With that kind of power, I imagine it'll overlap with the shrinking parameter counts, and they'll meet somewhere in the middle: home PCs that can run 200B models that outperform GPT4 or something. Idunno.
      It feels like a fantasy, like how when I was in fourth grade back in '98, my "portrait of the future" drawing for class included a "Pentium 10 CPU". I had no idea things would turn out this way, and it's only thanks to your hard work that I even have an inkling of things to come.

  • @Think_Global
    @Think_Global 4 місяці тому +1

    Anytime i think were slowing down, we shift gears

  • @me-ry9ee
    @me-ry9ee 4 місяці тому

    didnt nvidia start of with the tesla microarchitecture back in 2006 ish to solve the huge data solving problem? Im pretty skeptical since these big companies have already been tackling such issues.

  • @hypersonicmonkeybrains3418
    @hypersonicmonkeybrains3418 4 місяці тому +2

    i want sony or Microsoft to develop a custom chip for running AI image diffusion, that way they could put it inside the next gen games consoles and they would work in games that utilize AI image tech to varying degrees. They might have a custom chip for an LLM as well, specifically for NPC voice or text generation.

  • @davidh.65
    @davidh.65 4 місяці тому +1

    If the chip works NVDA will create a similar version and crush them in two seconds. Interesting development tho

  • @KolTregaskes
    @KolTregaskes 4 місяці тому +1

    11:35 How have you got access to MJ v6? I'm jealous. We only had the rating party on Saturday.

    • @aiexplained-official
      @aiexplained-official  4 місяці тому +2

      These are from the rating party!

    • @KolTregaskes
      @KolTregaskes 4 місяці тому +1

      @@aiexplained-official Ah, where did you get the prompts? Note: the quality you see in the rating party is "plain/boring/I opinionated/bad". Their words, not mine. 😃 I'm not expecting much improvement in quality in v6 but better prompt understanding. V7 is what they have said is the one with improved quality.

    • @ShawnFumo
      @ShawnFumo 4 місяці тому +1

      @@KolTregaskesThere is prompts in the alt text I think. You shouldn’t look at it before rating since that isn’t what this rating party is for, but it is still fun to see what the prompt was.

    • @KolTregaskes
      @KolTregaskes 4 місяці тому

      @@ShawnFumo Yeah no that would defeat the point. :-) That will be the next rating party. ;-)

  • @memegazer
    @memegazer 4 місяці тому +1

    Thanks!

    • @memegazer
      @memegazer 4 місяці тому +1

      Dang, another great vid, you are rapidly becoming my favorite source for AI news.

    • @aiexplained-official
      @aiexplained-official  4 місяці тому +1

      What do you mean rapidly! I thought I already was MG!

    • @memegazer
      @memegazer 4 місяці тому

      @@aiexplained-official
      lol...well let me just say this weeks vid set you apart even more

  • @user-ni2rh4ci5e
    @user-ni2rh4ci5e 4 місяці тому

    It definitely seems like there's been a boost in mathematical reasoning. Not sure if it's GPT 4.5 in operation or not, but that's not really the point. They could be testing the 4.5 version on the public in parts or applying it only to certain sections. Or, even funnier, it might be self-improving and evolving to be a real 4.5 version to suit its hallucination.

  • @Robert_McGarry_Poems
    @Robert_McGarry_Poems 4 місяці тому +1

    It would be interesting to see some of the prompt engineering integrated into the user side of the API. Like, pre prompting... The really useful bits of prompt leading, can be checked or not... A list of prompts like, tip it money, think step by step, you are looking good today... And other useful bits. That way users don't have to write a novel in the input field. Then maybe even use a very lite model to upscale the input question and pre prompts, into that novel.

    • @Robert_McGarry_Poems
      @Robert_McGarry_Poems 4 місяці тому

      Momba looks very intriguing. The transformer chip also looks very good... Honestly, I think momba is going to take over, but it really shouldn't be much to upgrade those chips to be similar. I mean, just adding the second path would be a complete redesign, obviously, but that's almost all there is to it...

    • @Robert_McGarry_Poems
      @Robert_McGarry_Poems 4 місяці тому

      I love the insight, it's not about generalized output, or responses... it's about one type of circuit doing generalized computation that can then be scaled up logarithmically... 😊

    • @Robert_McGarry_Poems
      @Robert_McGarry_Poems 4 місяці тому

      Momba architecture with, instead of state memory analogue memory. Effectively making the whole model circuit its own active markov blanket.

  • @shawnryan3196
    @shawnryan3196 4 місяці тому

    Have you looked into versus AI ? They have an open letter to OAI saying they have a direct line to AGI. Completely different architecture. Adapts in real time

  • @ellenripley4837
    @ellenripley4837 4 місяці тому +1

    11:55 - Look at her right arm and how the fabric folds in such an unrealistic way. It looks like the rest of the woman's shirt is made of a different fabric from the arms part of the sleeve. Also the bottom part where the hand meets the sleeve has a weird angle that follows the line of the bottom of the sleeve making her arm look deformed. Also the fingers and her bottom teeth have a weird anatomy.

  • @Kivalt
    @Kivalt 4 місяці тому +2

    Does that mean that we'll be able to have an "AI card" on our PCs like we have graphics cards?

    • @aiexplained-official
      @aiexplained-official  4 місяці тому

      Let's see!

    • @skierpage
      @skierpage 4 місяці тому

      Pixel phones from Google already have a TPU chip that accelerats AI tasks. But right now it turns out that Apple's custom ARM central processing units destroy Google's meh CPUs in conjunction with custom TPUs.
      We'll see.

  • @GrindThisGame
    @GrindThisGame 4 місяці тому +2

    I suspect it would not take long for Nvidia to make such a specialized chip. Also sort of risky if transformers become the old way of doing things.

  • @BradleyZS
    @BradleyZS 4 місяці тому +1

    I like the idea that you could just tell an LLM "answer like a smarter model would" and it would give smarter results.

    • @tacitozetticci9308
      @tacitozetticci9308 3 місяці тому

      If it doesn't work, just tell it to "imagine that telling you to answer as if you are a smarter model actually works, how would you answer?"

  • @Neomadra
    @Neomadra 4 місяці тому +11

    I found it always weird that OpenAI thinks it can forbid others to use their API to create a competitive chatbot while they have scraped the entire internet without asking for consent. At the same time it shows who's innovating and who's just following the trend.

    • @thebeckofkevin
      @thebeckofkevin 4 місяці тому +2

      Its a really common process in basically all business ventures. You find something to exploit that is openly available (in this case textual data from the internet), use that resource to construct a product to sell, then restrict the access and availability of both the product you create and the underlying resource that was used to build it.
      The best case scenario for Openai is every website has a 'training api' where you can pay to use data in a training data set (because openai just took it for free). The cost of litigation over the next decade will mean less and less as they grow.

    • @skierpage
      @skierpage 4 місяці тому

      Sort of. You put stuff on the internet and it's available to others, and it's unlikely that courts will find that a computer accessing it like a human does, four purposes that are clearly transformative, directly violates copyright. Especially since years ago courts okayed search engines providing snippets of websites and thumbnails of images without getting permission. Meanwhile everyone selling a service has Terms of Service. A person could try and claim that they stumbled upon Bing chat without agreeing to the tiny "by using this you agree to our terms of service" text at the bottom, but that's unlikely to fly for a big company.

  • @invizii2645
    @invizii2645 4 місяці тому +1

    Nice

  • @rashadfoux6927
    @rashadfoux6927 4 місяці тому +2

    I'm not going to cry tears over data thiefs profiting from their stolen data having their data stolen.

  • @jayaybe1
    @jayaybe1 4 місяці тому +2

    >cough< ...I heard it here first.

  • @rahul-qo3fi
    @rahul-qo3fi 4 місяці тому +1

    Midjourney is crazy!!

  • @schmutz06
    @schmutz06 4 місяці тому +4

    I really appreciate your videos. The prospect of low-fat GPT 4 standard models being in abundance is the thing I look forward to in 2024 the most (currently). The implications for personal assistant and also AI controlled NPC's in videogames and interactive apps is exciting. Also with reference to magnific AI; I know it's only a matter of time before that kind of upscaling is significantly lower cost, or free via Stable Diffusion. I used the free magnific AI credits and was blown away by what it does... I can't wait to play with that capability more next year.

    • @btm1
      @btm1 4 місяці тому

      ok kid

  • @rickandelon9374
    @rickandelon9374 4 місяці тому +1

    holy moly. 😮

  • @penguiburst
    @penguiburst 4 місяці тому +1

    great

  • @H1kari_1
    @H1kari_1 4 місяці тому +2

    I bet the moment some model gets too close to GPT4s performance OpenAI whips out their new model that absolutely crushes everything again. Probably the have it already but are taking careful time to align it since it is too powerful. I can't imagine that while all the progress is made in AI since GPT4 came out OpenAI put their feets up idly watching the competition get ahead. They ought to have some insurance ready.