Great video as always. I've been working with these models for a few days now and they are very impressive. I had a few issues with the Q4s and even Q8 but the fp16 is very impressive and fast too. To me, being trained multi-lingual is obviously going to improve reasoning and knowledge. The training includes different cultures, logic and definitions. I can't see how we could achieve the goal of AGI without multi-modal and poly-lingual models. Like Mistral, but (currently) better, this shows the world that AI is not a US only game, I welcome Qwen to the top of my LLM library.
Interesting now we have two really good models this year LLama-3 and Qwen-2, hopefully we can see another version of my other best model Yi. We be cool if someone can do merge of weights with these models. The best OS model for sometime was a Nous Research Merge.
I'm testing function calling now along with some agentic tasks. Very curious to see how this model performs in these areas. What models do you generally daily drive for work?
Fingers crossed we still get LLama3 400B eventually - hopefully Yan wasn't bluffing when he dissuaded the rumors that it was never going to be released publicly.
Gonna try running it full fp16 locally tomorrow, and going to ask it a lot if programming, science, and political questions. Wizaedlm mixtral 8*22b is pretty amazing and I've got my doubts that qwen 2 72b will beat it.
My concern is that they didn't mention how many tokens it was trained on. It might perform well on basic benchmarks, but it could fail on more difficult tasks
I dont agree with the restrictions on selling GPUs to China. All that means is that Chinese programmers develop on Huweii rather than Nvidia - that doesn't help the US at all. If anything it probably speeds up China catching up to the USA in AI hardware!
Eh, nVidia shouldn't be giving China a leg up. If they manage to make their own slower GPUs good for them. They're actually still trying to get 4090 GPU dies into the country by any means necessary.
Thanks for the video, i actually wanted to hear your thoughts on this. If you were only set to use 2 of the 4 LLMs for a SaaS project, which one would you use and why? Phind-CodeLlama (Phind-CodeLlama-34B-Python-v1), Deepseek-coder-7B-instruct, CodeLlama-34B and Qwen 2 72B. 😁
I'd probably still opt for Deepseek Coder, but I actually haven't tried the latest version of Phind CodeLlama. Which of these model is your go-to coding model?
@@aifluxchannel interesting answer. No trust for the Chinese ay? lol I'd go for both Phind-CodeLlama and Deepseek-Code. One for generation and the other for review/debugging.
I tried the 72b version in 5bit and 8bit (gguf). It claims it's training ended in 2021, then gies on to explain facts from 2022 and does mot see the contradiction when asked about it. Its multilanguage capabilities are better than llama 3 70b. But its logic capabilities are far worse unfortunately (in my opinion).
great video! i can't wait until i get my tesla k80 to try this in half precision (i also have an rtx 3060 12gb) edit: i will have 36gb vram total, plus 64gb ram.
Interesting config! I'm seeing more and more people with 4x 3060 machines or using p40s to quantize to then deploy on 3060 machines. Would you be open to sharing more of your workflow sometime soon?
Why are you making assumptions the Chinese models simply don't care about safety, while OpenAI and other US-based companies don't necessarily have a great track-record. You should get your bias checked before you review more non-US models.
@@aifluxchannel I don't see why IP laws has anything to do with AI safety, let alone we're talking about an open source model that has the best license term. Also, these are individual tech companies behaving on their corp interests, simply labeling all these as 'China' is grossly oversimplifying. Lastly, if you look back 2 centuries, the US rises by not respecting UK IP laws and copied the best stuff from Britain's industral progress.
Yeah, I would think twice before using a Chinese model for writing any software that may be used in sensitive projects. No need to automate putting backdoors in.
I'd worry more about inference code they supply. Hard to hide anything too dangerous in weights especially in safe-tensors format! But yeah, unless it's on your GPU always be careful sharing code even without secrets / passkeys. What kind of work do you use LLM's for?
Great video as always. I've been working with these models for a few days now and they are very impressive. I had a few issues with the Q4s and even Q8 but the fp16 is very impressive and fast too.
To me, being trained multi-lingual is obviously going to improve reasoning and knowledge. The training includes different cultures, logic and definitions. I can't see how we could achieve the goal of AGI without multi-modal and poly-lingual models.
Like Mistral, but (currently) better, this shows the world that AI is not a US only game, I welcome Qwen to the top of my LLM library.
Interesting now we have two really good models this year LLama-3 and Qwen-2, hopefully we can see another version of my other best model Yi. We be cool if someone can do merge of weights with these models. The best OS model for sometime was a Nous Research Merge.
I'm testing function calling now along with some agentic tasks. Very curious to see how this model performs in these areas. What models do you generally daily drive for work?
It would be awesome to see Qwen400B, jabbing the LLama3_400B we won't get!
Fingers crossed we still get LLama3 400B eventually - hopefully Yan wasn't bluffing when he dissuaded the rumors that it was never going to be released publicly.
Gonna try running it full fp16 locally tomorrow, and going to ask it a lot if programming, science, and political questions.
Wizaedlm mixtral 8*22b is pretty amazing and I've got my doubts that qwen 2 72b will beat it.
Let us know how it goes! What kind of hardware are you using to run locally at full precision??
My concern is that they didn't mention how many tokens it was trained on. It might perform well on basic benchmarks, but it could fail on more difficult tasks
Haha, then they'd have to admit all of the US copyrighted content they used to train it!
@@aifluxchannel Innocent until proven guilty!😝
I dont agree with the restrictions on selling GPUs to China. All that means is that Chinese programmers develop on Huweii rather than Nvidia - that doesn't help the US at all. If anything it probably speeds up China catching up to the USA in AI hardware!
Eh, nVidia shouldn't be giving China a leg up. If they manage to make their own slower GPUs good for them. They're actually still trying to get 4090 GPU dies into the country by any means necessary.
Brilliant video
Thanks! Let us know what you want to see more of!
I'm running the 70b q2 version and am surprised how fast it's running. I'm getting 15.79 tokens on my system (dual 3090.)
Wow, that's incredible performance for 2x 3090.
Isn't it more or less braindead at q2?
Thanks for the video, i actually wanted to hear your thoughts on this. If you were only set to use 2 of the 4 LLMs for a SaaS project, which one would you use and why?
Phind-CodeLlama (Phind-CodeLlama-34B-Python-v1), Deepseek-coder-7B-instruct, CodeLlama-34B and Qwen 2 72B. 😁
I'd probably still opt for Deepseek Coder, but I actually haven't tried the latest version of Phind CodeLlama. Which of these model is your go-to coding model?
@@aifluxchannel interesting answer. No trust for the Chinese ay? lol
I'd go for both Phind-CodeLlama and Deepseek-Code. One for generation and the other for review/debugging.
Pretty cool
Thanks! What LLMs do you run locally?
@@aifluxchannelllama 3 8b 32k tokens and mistral 7b 0.3v
thank you!
You're welcome! Thanks for watching!
0.5 b is literally sucks phones aren't ready for that either
I agree small models like this for now don't make sense, especially with the ability to stream Groq over a basic 3g connection.
its' 3gb PHI 3 is smaller by microsoft still.
Good point! Have you compared these models yet?
Safety is bullshit. But I tested this model not that impressed so far. Super hallucinations
What prompts did you attempt to use? Was this on their 72B model or the smaller 7b model?
@@aifluxchannel nah can't get the big one. Tried the 7b model and I do creative writing and even with info placed in it super hallucinated.
@@VanSocero Did you try to lower the temperature, maybe even to 0 for testing?
i tried model and it is great on tests but quite bad in logic
Which version were you using, I used the 7B and 72B models and my impression was it's roughly equivalent to Mistral 8x7B
I tried the 72b version in 5bit and 8bit (gguf). It claims it's training ended in 2021, then gies on to explain facts from 2022 and does mot see the contradiction when asked about it.
Its multilanguage capabilities are better than llama 3 70b. But its logic capabilities are far worse unfortunately (in my opinion).
great video! i can't wait until i get my tesla k80 to try this in half precision (i also have an rtx 3060 12gb)
edit: i will have 36gb vram total, plus 64gb ram.
Interesting config! I'm seeing more and more people with 4x 3060 machines or using p40s to quantize to then deploy on 3060 machines. Would you be open to sharing more of your workflow sometime soon?
@@aifluxchannel sure, i use ollama for using LLMs, and i like to use open webui because i hate html lol
@@aifluxchannel the tesla k80 didnt fit and now i need a new case, motherboard, and probably cpu to match the motherboard :(
interesting, did anyone try to see if there is any 'political' induced biais
Why are you making assumptions the Chinese models simply don't care about safety, while OpenAI and other US-based companies don't necessarily have a great track-record. You should get your bias checked before you review more non-US models.
I disclosed my bias - china has a distinct history of not respecting western IP laws.
@@aifluxchannel I don't see why IP laws has anything to do with AI safety, let alone we're talking about an open source model that has the best license term. Also, these are individual tech companies behaving on their corp interests, simply labeling all these as 'China' is grossly oversimplifying. Lastly, if you look back 2 centuries, the US rises by not respecting UK IP laws and copied the best stuff from Britain's industral progress.
@@aifluxchannelthat’s it. That’s all, I or anybody else need not to listen or watch your nonsense.
Yeah, I would think twice before using a Chinese model for writing any software that may be used in sensitive projects. No need to automate putting backdoors in.
I'd worry more about inference code they supply. Hard to hide anything too dangerous in weights especially in safe-tensors format! But yeah, unless it's on your GPU always be careful sharing code even without secrets / passkeys. What kind of work do you use LLM's for?
For work you shouldn't rely on an AI generated code that you don't understand, that's the lesson I learned at least.