Qwen 2 LLM Unleashed: The SMALLEST Model That Can Outcode Llama 3?

Ai Flux

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 28 жов 2024

КОМЕНТАРІ • 51

@johntdavies 4 місяці тому ⁺²
Great video as always. I've been working with these models for a few days now and they are very impressive. I had a few issues with the Q4s and even Q8 but the fp16 is very impressive and fast too.
To me, being trained multi-lingual is obviously going to improve reasoning and knowledge. The training includes different cultures, logic and definitions. I can't see how we could achieve the goal of AGI without multi-modal and poly-lingual models.
Like Mistral, but (currently) better, this shows the world that AI is not a US only game, I welcome Qwen to the top of my LLM library.
@elchippe 4 місяці тому ⁺³
Interesting now we have two really good models this year LLama-3 and Qwen-2, hopefully we can see another version of my other best model Yi. We be cool if someone can do merge of weights with these models. The best OS model for sometime was a Nous Research Merge.
@aifluxchannel 4 місяці тому ⁺¹
I'm testing function calling now along with some agentic tasks. Very curious to see how this model performs in these areas. What models do you generally daily drive for work?
@Sanguen666 4 місяці тому ⁺³
It would be awesome to see Qwen400B, jabbing the LLama3_400B we won't get!
@aifluxchannel 4 місяці тому
Fingers crossed we still get LLama3 400B eventually - hopefully Yan wasn't bluffing when he dissuaded the rumors that it was never going to be released publicly.
@AaronALAI 4 місяці тому ⁺²
Gonna try running it full fp16 locally tomorrow, and going to ask it a lot if programming, science, and political questions.
Wizaedlm mixtral 8*22b is pretty amazing and I've got my doubts that qwen 2 72b will beat it.
@aifluxchannel 4 місяці тому
Let us know how it goes! What kind of hardware are you using to run locally at full precision??
@gileneusz 4 місяці тому ⁺³
My concern is that they didn't mention how many tokens it was trained on. It might perform well on basic benchmarks, but it could fail on more difficult tasks
@aifluxchannel 4 місяці тому ⁺¹
Haha, then they'd have to admit all of the US copyrighted content they used to train it!
@gileneusz 4 місяці тому
@@aifluxchannel Innocent until proven guilty!😝
@Wobbothe3rd 4 місяці тому ⁺⁴
I dont agree with the restrictions on selling GPUs to China. All that means is that Chinese programmers develop on Huweii rather than Nvidia - that doesn't help the US at all. If anything it probably speeds up China catching up to the USA in AI hardware!
@aifluxchannel 4 місяці тому ⁺²
Eh, nVidia shouldn't be giving China a leg up. If they manage to make their own slower GPUs good for them. They're actually still trying to get 4090 GPU dies into the country by any means necessary.
@aa-xn5hc 4 місяці тому ⁺²
Brilliant video
@aifluxchannel 4 місяці тому
Thanks! Let us know what you want to see more of!
@dholzric1 4 місяці тому ⁺²
I'm running the 70b q2 version and am surprised how fast it's running. I'm getting 15.79 tokens on my system (dual 3090.)
@aifluxchannel 4 місяці тому ⁺¹
Wow, that's incredible performance for 2x 3090.
@testales 4 місяці тому
Isn't it more or less braindead at q2?
@xXWillyxWonkaXx 4 місяці тому ⁺¹
Thanks for the video, i actually wanted to hear your thoughts on this. If you were only set to use 2 of the 4 LLMs for a SaaS project, which one would you use and why?
Phind-CodeLlama (Phind-CodeLlama-34B-Python-v1), Deepseek-coder-7B-instruct, CodeLlama-34B and Qwen 2 72B. 😁
@aifluxchannel 4 місяці тому ⁺¹
I'd probably still opt for Deepseek Coder, but I actually haven't tried the latest version of Phind CodeLlama. Which of these model is your go-to coding model?
@xXWillyxWonkaXx 4 місяці тому
@@aifluxchannel interesting answer. No trust for the Chinese ay? lol
I'd go for both Phind-CodeLlama and Deepseek-Code. One for generation and the other for review/debugging.
@JerryPena 4 місяці тому ⁺¹
Pretty cool
@aifluxchannel 4 місяці тому
Thanks! What LLMs do you run locally?
@jossejosse952 4 місяці тому
@@aifluxchannelllama 3 8b 32k tokens and mistral 7b 0.3v
@mohsenghafari7652 4 місяці тому
thank you!
@aifluxchannel 4 місяці тому ⁺¹
You're welcome! Thanks for watching!
@ozgurdenizcelik 4 місяці тому ⁺¹
0.5 b is literally sucks phones aren't ready for that either
@aifluxchannel 4 місяці тому ⁺²
I agree small models like this for now don't make sense, especially with the ability to stream Groq over a basic 3g connection.
@jasonreviews 4 місяці тому ⁺¹
its' 3gb PHI 3 is smaller by microsoft still.
@aifluxchannel 4 місяці тому
Good point! Have you compared these models yet?
@VanSocero 4 місяці тому ⁺⁶
Safety is bullshit. But I tested this model not that impressed so far. Super hallucinations
@aifluxchannel 4 місяці тому ⁺¹
What prompts did you attempt to use? Was this on their 72B model or the smaller 7b model?
@VanSocero 4 місяці тому
@@aifluxchannel nah can't get the big one. Tried the 7b model and I do creative writing and even with info placed in it super hallucinated.
@testales 4 місяці тому
@@VanSocero Did you try to lower the temperature, maybe even to 0 for testing?
@ozgurdenizcelik 4 місяці тому ⁺²
i tried model and it is great on tests but quite bad in logic
@aifluxchannel 4 місяці тому ⁺²
Which version were you using, I used the 7B and 72B models and my impression was it's roughly equivalent to Mistral 8x7B
@AI-HOMELAB 4 місяці тому ⁺¹
I tried the 72b version in 5bit and 8bit (gguf). It claims it's training ended in 2021, then gies on to explain facts from 2022 and does mot see the contradiction when asked about it.
Its multilanguage capabilities are better than llama 3 70b. But its logic capabilities are far worse unfortunately (in my opinion).
@v0idbyt3 4 місяці тому
great video! i can't wait until i get my tesla k80 to try this in half precision (i also have an rtx 3060 12gb)
edit: i will have 36gb vram total, plus 64gb ram.
@aifluxchannel 4 місяці тому
Interesting config! I'm seeing more and more people with 4x 3060 machines or using p40s to quantize to then deploy on 3060 machines. Would you be open to sharing more of your workflow sometime soon?
@v0idbyt3 4 місяці тому
@@aifluxchannel sure, i use ollama for using LLMs, and i like to use open webui because i hate html lol
@v0idbyt3 4 місяці тому
@@aifluxchannel the tesla k80 didnt fit and now i need a new case, motherboard, and probably cpu to match the motherboard :(
@jmirodg7094 4 місяці тому ⁺¹
interesting, did anyone try to see if there is any 'political' induced biais
@mz8755 4 місяці тому ⁺³
Why are you making assumptions the Chinese models simply don't care about safety, while OpenAI and other US-based companies don't necessarily have a great track-record. You should get your bias checked before you review more non-US models.
@aifluxchannel 4 місяці тому ⁺¹
I disclosed my bias - china has a distinct history of not respecting western IP laws.
@mz8755 4 місяці тому
@@aifluxchannel I don't see why IP laws has anything to do with AI safety, let alone we're talking about an open source model that has the best license term. Also, these are individual tech companies behaving on their corp interests, simply labeling all these as 'China' is grossly oversimplifying. Lastly, if you look back 2 centuries, the US rises by not respecting UK IP laws and copied the best stuff from Britain's industral progress.
@ajwo5984 4 місяці тому
@@aifluxchannelthat’s it. That’s all, I or anybody else need not to listen or watch your nonsense.
@pensiveintrovert4318 4 місяці тому ⁺²
Yeah, I would think twice before using a Chinese model for writing any software that may be used in sensitive projects. No need to automate putting backdoors in.
@aifluxchannel 4 місяці тому ⁺¹
I'd worry more about inference code they supply. Hard to hide anything too dangerous in weights especially in safe-tensors format! But yeah, unless it's on your GPU always be careful sharing code even without secrets / passkeys. What kind of work do you use LLM's for?
@testales 4 місяці тому
For work you shouldn't rely on an AI generated code that you don't understand, that's the lesson I learned at least.

Наступне

Автоматичне відтворення