🐍 Mamba2 8B Hybrid 🚀: NVIDIA Stealth drops their latest Mamba2 Model!
Вставка
- Опубліковано 12 вер 2024
- The new Nvidia Mamba2 8B Hybrid LLM is here, and it's shaking things up. This video dives deep into the advancements it brings, potentially offering faster performance than traditional Transformer models. Could this be the future of large language models? We'll explore the rumors, specs, and what this means for the field of NLP. Join us to unravel the mystery of the Nvidia Mamba2 8B Hybrid LLM!
Tell us what you think in the comments below!
------------------------
Mamba 2 Hybrid 8B Hugging Face Card: huggingface.co...
Mamba-2 Release: x.com/_albertg...
Faro-Yi-9B-DPO: x.com/01AI_Yi/...
Yes, more Mamba demos please.
Would be interesting to see benchmark numbers for the mamba models vs same-size transformer-based models
We'll have to wait and see. Sometimes these models benchmark wildly different especially when you start to look at how long-context window scaling works out.
Nvidia will literally train a state of the art AI model to use VRAM more effectively before letting us pay the $100 more to go from 24GB consumer cards to 32GB consumer cards lmfao.
Quite literally haha. Granted, we all know we'll have to pay more than $100 to go from 24GB to 32GB on RTX cards :(
i havent heard many people talk about nemotron either. nvidia really be low key dropping some insane fking stuff. thanks for the news!
nemotron-type models are going to be the future as we close in on the limit of natural high-quality data.
Wasn’t ai21’s Jamba the first mamba transformer hybrid?
Every new partnership announcement gets me more excited about Cyberopolis!
A horrible scandal with Stable diffusion 3 new licensing terms, even ban of use on certain platforms. People see that with such license next buyer of stable diffusion will also get incredible rights on models and everything made by them. Open models was a lie, maybe lawless China will be the only oasis for such.
The open source models can be forked, this happens all the time. The core functionality is there and can be built upon. Don't sit around for China to screw everything up. The CCP controls China and has become a pariah on the world stage.
Less is more when comes to LLMs hope they used more than the snake game implementations to train this model also Megaton is the leader of the Decepticons while Megaton is a city in the Capital Wasteland. Seems people love their favorite shows or deadly snakes. Please run the model through your normal tests.
The reason I got Cyberopolis is because I believe decentralization is more important than anything else.
Everyone talks about Mamba and no one tests it, because we all know SSMs don't work.
A giant context window? Oh, let me try needle in the hayst--Oh it can't copy from the prompt to the output. :-|
Alright, let me just have it convert this data into JSON--No copying from prompt to output! Oh, right.
Well, then let's just do function calling. Here are the functions that--NO COPYING FROM PROMPT TO OUTPUT!
Oh... Right. Uhm...
but mamba2-hybrid can, afaik it was shown in the paper
also, imo TOVA could help with limiting KV-cache without affecting such abilities much
@@mira_nekosi Well yeah, Mamba2-hybrid uses attention. That's where transformers get their power.
@@jonmichaelgalindo i know, but it's much faster and uses less memory
also, imo performance loss with TOVA will be even less than in transformers, especially with finetuning for it
Recurrent Neural Networks return!!!!
I'm curious to see how this new mamba 2 performs against a Llama 3
Right now Mamba is more of an academic / research endeavor. Hopefully we'll see reasonable evals this week although I think for now although Mamba uses less compute LLama3 is likely still more practically capable.
It's trained for 3.5 trillion tokens, Llama 3 for 15 trillion, so it can't possibly perform as good. I really hope they take a Mamba model and train it adequately to match the level of training current SOTA transformer models have.
Forget the rest, Cyberopolis is where it's at. Potential moonshot!
FOMO kicking in as Cyberopolis partners with more and more merchants. Bullish!
I'm going to make the prediction your mid range GPU pick is a 4060ti 16GB.
Keep an eye out for our next video ;)