QUESTION: Can you propose a Prompt that would show the 400B Model gives better results than the 7B version. I use all the LLMs and find them relatively on par. What would the difference in the inferences?
Depends on task. Some tasks are fine with 0.5b, others need larger. 405b will probably be a testing ground, and not something for serious frequent use for now. But hey, Meta said their goal was to let the community optimize so they can reap the benefits.
I'm planning to use it on groq. I tried running Llama-70b on a rented server with 4 4090 cards, think it was through vast ai, and it was quite slow. You can also rent machines with 12 4090 cards which is 288GB of VRAM. Presumably they use 8 PCI-e channels instead of 16. It's enough to run a 405B model with 4-bit quantization, but I'd expect the output to be moderately slow. It would also be possible to buy a thousand groq cards from mouser at $30k per card to run locally, but it would take a number of racks to host the thing because each card has 230MB of static RAM.
I generally always prefer renting 8x H100 machines from Hyperstack or Lambda over cobbled together 4090 machines on Vast AI. Pricing seems to be about the same. But I agree, PCIE bandwidth is everything and even with this many massive GPUs tps is going to be slow.
@@aifluxchannel Makes sense for rentals. I was curious because 12 x 4090's and one H100 both cost around $40k. There could be a batching/training workload where 12 cards is better. It would definitely be a better space heater, but it's not great for TPS.
Exactly why I don't have this OR messenger on any of my personal devices! But I think we know they're using all of that chat metadata to train future llamas ;)
What was that Robot head at the end of the video?
QUESTION: Can you propose a Prompt that would show the 400B Model gives better results than the 7B version. I use all the LLMs and find them relatively on par. What would the difference in the inferences?
Depends on task. Some tasks are fine with 0.5b, others need larger. 405b will probably be a testing ground, and not something for serious frequent use for now. But hey, Meta said their goal was to let the community optimize so they can reap the benefits.
I'm planning to use it on groq. I tried running Llama-70b on a rented server with 4 4090 cards, think it was through vast ai, and it was quite slow. You can also rent machines with 12 4090 cards which is 288GB of VRAM. Presumably they use 8 PCI-e channels instead of 16. It's enough to run a 405B model with 4-bit quantization, but I'd expect the output to be moderately slow. It would also be possible to buy a thousand groq cards from mouser at $30k per card to run locally, but it would take a number of racks to host the thing because each card has 230MB of static RAM.
I generally always prefer renting 8x H100 machines from Hyperstack or Lambda over cobbled together 4090 machines on Vast AI. Pricing seems to be about the same. But I agree, PCIE bandwidth is everything and even with this many massive GPUs tps is going to be slow.
@@aifluxchannel Makes sense for rentals. I was curious because 12 x 4090's and one H100 both cost around $40k. There could be a batching/training workload where 12 cards is better. It would definitely be a better space heater, but it's not great for TPS.
We need the AI flux approved ML build!
but how will we use it in whatsapp? as my personal assistant that answers all my messages?
Exclusive to the US
@@Nid_All that's so american. they always want to be first.
I hope Whatsup will enable to send/get transcripts from audio messages
But can it run Chrysis?
fingers crossed for apple's ARM + unified memory to alleviate the ridiculous VRAM overhead
Probably the cheapest would be to just buy the tinybox when it comes out.
I do not disagree haha
Whennnnnnnnn?
Soon TM ;)
Darn 😢
DeepSeek V2 is better
With time the relative performance per token / size of DeepSeek V2 gets better and better.
Not on Whatsapp. Please stay out of my message Zucker.
Exactly why I don't have this OR messenger on any of my personal devices! But I think we know they're using all of that chat metadata to train future llamas ;)