Llama 3.2 Deep Dive - Tiny LM & NEW VLM Unleashed By Meta

bycloud

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 25 жов 2024

КОМЕНТАРІ • 44

@bycloudAI День тому ⁺⁸
Might be a bit late about the llama-3.2 news as usual oops. Working on the New Claude 3.5 Sonnet video now (see u in 3 weeks lol
and check out NVIDIA's Llama-3.2 resources 👇
Jetson AI Lab: nvda.ws/4eO2VFU
Multimodal RAG with Llama 3.2 GitHub: nvda.ws/4eyspY0
NVIDIA NIM for Developers: nvda.ws/4dCXys1
@mmmm768 День тому ⁺⁹²
You missed Pixtral 12B, a literal gem when it comes to multimodality. It is miles ahead of Llama 3.2 11B and comparable to Llama 3.2 90B.
@dronesflier7715 День тому ⁺¹³
pixtral is actually insane. it got thrown under the radar so hard it isn't even funny. AFAIK no major inference backend supports a quantized version of it ;-;
@cdkw2 День тому ⁺³⁵
NVIDIA as a partner goes crazy, I am happy you are getting the attention you deserve 👍
@IsaacFoster.. 10 годин тому ⁺²
I assume everyone will have a LLM running in their phones locally in 3-5 years
@countofst.germain6417 День тому ⁺⁷
I run phi 3.5 mini locally on my phone, it is really good for the size and runs pretty well too.
@pneuma23093 День тому ⁺²
What does it help you do? I’m a noob just getting into this. Would love to know what kind of augmentation it would have on the workflow of an average Joe.
@DriftJunkie 18 годин тому ⁺¹
What do you mean "I run it on my phone"?
What interface do you use? How do you manage the models?
@countofst.germain6417 16 годин тому
@@DriftJunkie they have apps that handle all that now, you're living in the past lol
@PurpleBird-mh7vb 15 годин тому
@@countofst.germain6417name pls
@pneuma23093 10 годин тому
@@countofst.germain6417 can you gimme the name of the app which I can use to run it on a phone?
@BrainSlugs83 18 годин тому ⁺¹
RAM on mobile devices isn't actually shared between the CPU and GPU like that. They actually do have dedicated VRAM, it's just in the order of like ~12 KB or so (literally orders of magnitude less than the frame buffer takes up, so it's usually not mentioned), but the GPU draws the screen as a sequence of tiles, so it never needs the whole frame buffer anyway, but it does spend a ton of time transferring data from RAM to tile memory just to render each screen frame. Some older phones let you adjust the resolution to to half/quarter of their native resolution to save power, because it drastically reduces the amount of RAM to VRAM transfers it takes to render the framebuffer on the GPU.
@duckyblender4258 День тому ⁺⁴
The > are in the wrong direction and llama 3.2 is the same as 3.1 but with vision, no difference in text weights so it’s not trained from a larger model
@rawallon День тому ⁺⁵
Its like they say: Thats cray cray
@laden6675 День тому ⁺¹
that's completely delulu
@4.0.4 22 години тому ⁺²
Pixtral, Qwen VL, Phi, there's so many.
There's an open one that can ingest videos too, forgot the name.
Sadly you ask any of them to OCR Japanese pages and they can't do it properly.
@MilesBellas День тому ⁺⁴
LLAMA in ComfyUI = interesting
@barry_wastaken День тому ⁺³
Why aren't you covering 1bit LLMS, it sounds very promising and the tests are testifying to that too.
@bits360wastaken День тому ⁺²
Because its been 8 months and still theres no publicly available models that can compete with even old models, not to mention needing to run at native precision plus decompression cost anyways due to lack of hardware?
@barry_wastaken День тому ⁺¹
@@bits360wastaken
Hmm, you're missing my point though.
Even though i agree with the lack of available public competitive models that's not what im asking, I'm talking about the coverage of the overall technology since it was open sourced with a huge update lately and great improvements with promising potentials.
But besides the llama3-8B-1.58Bit-100B tokens model that can literally run a single core with 6-7 tokens per second, there's no public model as good as main stream models, but generally 1bit quantization is proven to be as close as float points quantization precisions but a lot more performant and efficient.
@theresalwaysanotherway3996 День тому ⁺¹
I would love to see video talking about how the hell these AI labs are making multi-modal models, it's obviously the direction that the industry is moving in and has been since GPT-4o, increasing scale on text-only data is getting harder and harder, including other modalities is the next obvious direction to scale in and could lead to models that generalise much better.
День тому
Definitely do a video on multi-modality, thanks!
@רותםישראלי-כ3ד День тому
In those models, the image tokens are not fed through cross-attention but are instead provided alongside the text as input
@zekehobbs7738 День тому
Llama uses cross attention. Qwen is as you said.
@AdityaMayukhSom 23 години тому
Could you please make a video about running these models in INT8 for local inference? There seems to be no content over the internet for 1B and 3B model with quantization for inference locally.
@bycloudAI 22 години тому
they just released some new quants
x.com/AIatMeta/status/1849469912521093360
@4.0.4 22 години тому
The 11B and and 90B aren't distilled but the 8B and 70B with vision encoders on top. Yeah a 20B vision encoder on the big one.
@jameschen2308 День тому ⁺¹
The < should be >. Nice video tho
@telotawa 23 години тому
no, 90b is 70b, 11b is 8b, you didn't pay attention to the papers, dude. the extra parameters are the vision adapter
@bycloudAI 22 години тому
im pretty sure they only said the image adapter is 100B for 405B params. As for the 90b and 11b, they didn't clarify how they have done it.
@dragonmares59110 День тому
I just wish they would try to focus on efficiency and smaller size model at one point, we are reaching a point where this getting out of reach of the common mortal and their hardware
@Tracing0029 23 години тому
It makes sense the model to be good at summarizing social media posts because Meta uses their platforms as data 😂
@vickeythegamer7527 16 годин тому
Qwen 0.5b to 70b all have 130k tokens i think thats what you haven't heard of 😅
@DouhaveaBugatti День тому
Man 😎you missed the lord of the underworld MOLMO BY Ai2
@JoeVSvolcano 14 годин тому
For the life of me I dont understand why the model sizes go from tiny to huge with nothing in between.. Why dont they make models that fully utilize 24GB RTX cards...
@MrPicklesAndTea День тому
I wonder when something is going to convince me to abandon mistral.
@mattmmilli8287 День тому
Mistral don’t give af 😅
@NoHandleToSpeakOf День тому
Try questions with wrong preposition. Mistral and Gemma cannot handle them. Llama and Qwen can.
@lule-ahmed День тому
it should v been " Boy Cloud " 😁😹✌
@maxarq День тому
Is the video sponsored by nvidia?
@abdelrahmanmostafa9489 20 годин тому
Talk about nvidia new model
@szebike День тому ⁺⁴
In my opinion the 3.2 series is unusable its extremely censored to a point its absurd. I thought for a second that Meta did something cool but once I saw 3.2 NOPE.
@Osanosa День тому
Gosh I'm so sick of this nerd talk

Наступне

Автоматичне відтворення