🐍 Mamba2 8B Hybrid 🚀: NVIDIA Stealth drops their latest Mamba2 Model!

Поділитися
Вставка
  • Опубліковано 12 вер 2024
  • The new Nvidia Mamba2 8B Hybrid LLM is here, and it's shaking things up. This video dives deep into the advancements it brings, potentially offering faster performance than traditional Transformer models. Could this be the future of large language models? We'll explore the rumors, specs, and what this means for the field of NLP. Join us to unravel the mystery of the Nvidia Mamba2 8B Hybrid LLM!
    Tell us what you think in the comments below!
    ------------------------
    Mamba 2 Hybrid 8B Hugging Face Card: huggingface.co...
    Mamba-2 Release: x.com/_albertg...
    Faro-Yi-9B-DPO: x.com/01AI_Yi/...

КОМЕНТАРІ • 29

  • @NicolasEmbleton
    @NicolasEmbleton 2 місяці тому +6

    Yes, more Mamba demos please.

  • @Person-hb3dv
    @Person-hb3dv 2 місяці тому +4

    Would be interesting to see benchmark numbers for the mamba models vs same-size transformer-based models

    • @aifluxchannel
      @aifluxchannel  2 місяці тому +1

      We'll have to wait and see. Sometimes these models benchmark wildly different especially when you start to look at how long-context window scaling works out.

  • @novantha1
    @novantha1 2 місяці тому +7

    Nvidia will literally train a state of the art AI model to use VRAM more effectively before letting us pay the $100 more to go from 24GB consumer cards to 32GB consumer cards lmfao.

    • @aifluxchannel
      @aifluxchannel  2 місяці тому

      Quite literally haha. Granted, we all know we'll have to pay more than $100 to go from 24GB to 32GB on RTX cards :(

  • @ryzikx
    @ryzikx 2 місяці тому +1

    i havent heard many people talk about nemotron either. nvidia really be low key dropping some insane fking stuff. thanks for the news!
    nemotron-type models are going to be the future as we close in on the limit of natural high-quality data.

  • @Nadavot
    @Nadavot Місяць тому +1

    Wasn’t ai21’s Jamba the first mamba transformer hybrid?

  • @pramilapatil806
    @pramilapatil806 2 місяці тому

    Every new partnership announcement gets me more excited about Cyberopolis!

  • @fontenbleau
    @fontenbleau 2 місяці тому +8

    A horrible scandal with Stable diffusion 3 new licensing terms, even ban of use on certain platforms. People see that with such license next buyer of stable diffusion will also get incredible rights on models and everything made by them. Open models was a lie, maybe lawless China will be the only oasis for such.

    • @xlr555usa
      @xlr555usa 2 місяці тому

      The open source models can be forked, this happens all the time. The core functionality is there and can be built upon. Don't sit around for China to screw everything up. The CCP controls China and has become a pariah on the world stage.

  • @southcoastinventors6583
    @southcoastinventors6583 2 місяці тому +1

    Less is more when comes to LLMs hope they used more than the snake game implementations to train this model also Megaton is the leader of the Decepticons while Megaton is a city in the Capital Wasteland. Seems people love their favorite shows or deadly snakes. Please run the model through your normal tests.

  • @amirsaifi184
    @amirsaifi184 2 місяці тому

    The reason I got Cyberopolis is because I believe decentralization is more important than anything else.

  • @jonmichaelgalindo
    @jonmichaelgalindo 2 місяці тому +1

    Everyone talks about Mamba and no one tests it, because we all know SSMs don't work.
    A giant context window? Oh, let me try needle in the hayst--Oh it can't copy from the prompt to the output. :-|
    Alright, let me just have it convert this data into JSON--No copying from prompt to output! Oh, right.
    Well, then let's just do function calling. Here are the functions that--NO COPYING FROM PROMPT TO OUTPUT!
    Oh... Right. Uhm...

    • @mira_nekosi
      @mira_nekosi 2 місяці тому

      but mamba2-hybrid can, afaik it was shown in the paper
      also, imo TOVA could help with limiting KV-cache without affecting such abilities much

    • @jonmichaelgalindo
      @jonmichaelgalindo 2 місяці тому

      @@mira_nekosi Well yeah, Mamba2-hybrid uses attention. That's where transformers get their power.

    • @mira_nekosi
      @mira_nekosi 2 місяці тому +1

      @@jonmichaelgalindo i know, but it's much faster and uses less memory
      also, imo performance loss with TOVA will be even less than in transformers, especially with finetuning for it

  • @Wobbothe3rd
    @Wobbothe3rd 2 місяці тому +3

    Recurrent Neural Networks return!!!!

  • @jmirodg7094
    @jmirodg7094 2 місяці тому +1

    I'm curious to see how this new mamba 2 performs against a Llama 3

    • @aifluxchannel
      @aifluxchannel  2 місяці тому +1

      Right now Mamba is more of an academic / research endeavor. Hopefully we'll see reasonable evals this week although I think for now although Mamba uses less compute LLama3 is likely still more practically capable.

    • @joech1065
      @joech1065 2 місяці тому +2

      It's trained for 3.5 trillion tokens, Llama 3 for 15 trillion, so it can't possibly perform as good. I really hope they take a Mamba model and train it adequately to match the level of training current SOTA transformer models have.

  • @kunalyadav9996
    @kunalyadav9996 2 місяці тому

    Forget the rest, Cyberopolis is where it's at. Potential moonshot!

  • @user-dn2bb2re9h
    @user-dn2bb2re9h 2 місяці тому

    FOMO kicking in as Cyberopolis partners with more and more merchants. Bullish!

  • @SiCSpiT1
    @SiCSpiT1 2 місяці тому

    I'm going to make the prediction your mid range GPU pick is a 4060ti 16GB.

    • @aifluxchannel
      @aifluxchannel  2 місяці тому +1

      Keep an eye out for our next video ;)