Llama 3-405B is Coming! 🚀 (on WhatsApp?)

Поділитися
Вставка
  • Опубліковано 28 жов 2024

КОМЕНТАРІ • 22

  • @TimPiatek
    @TimPiatek 3 місяці тому +2

    What was that Robot head at the end of the video?

  • @FilmFactry
    @FilmFactry 3 місяці тому +1

    QUESTION: Can you propose a Prompt that would show the 400B Model gives better results than the 7B version. I use all the LLMs and find them relatively on par. What would the difference in the inferences?

    • @blisphul8084
      @blisphul8084 3 місяці тому

      Depends on task. Some tasks are fine with 0.5b, others need larger. 405b will probably be a testing ground, and not something for serious frequent use for now. But hey, Meta said their goal was to let the community optimize so they can reap the benefits.

  • @nathanbanks2354
    @nathanbanks2354 3 місяці тому +1

    I'm planning to use it on groq. I tried running Llama-70b on a rented server with 4 4090 cards, think it was through vast ai, and it was quite slow. You can also rent machines with 12 4090 cards which is 288GB of VRAM. Presumably they use 8 PCI-e channels instead of 16. It's enough to run a 405B model with 4-bit quantization, but I'd expect the output to be moderately slow. It would also be possible to buy a thousand groq cards from mouser at $30k per card to run locally, but it would take a number of racks to host the thing because each card has 230MB of static RAM.

    • @aifluxchannel
      @aifluxchannel  3 місяці тому +1

      I generally always prefer renting 8x H100 machines from Hyperstack or Lambda over cobbled together 4090 machines on Vast AI. Pricing seems to be about the same. But I agree, PCIE bandwidth is everything and even with this many massive GPUs tps is going to be slow.

    • @nathanbanks2354
      @nathanbanks2354 3 місяці тому

      @@aifluxchannel Makes sense for rentals. I was curious because 12 x 4090's and one H100 both cost around $40k. There could be a batching/training workload where 12 cards is better. It would definitely be a better space heater, but it's not great for TPS.

  • @drewroyster3046
    @drewroyster3046 3 місяці тому +2

    We need the AI flux approved ML build!

  • @chatgpt4free
    @chatgpt4free 3 місяці тому +5

    but how will we use it in whatsapp? as my personal assistant that answers all my messages?

    • @Nid_All
      @Nid_All 3 місяці тому +1

      Exclusive to the US

    • @chatgpt4free
      @chatgpt4free 3 місяці тому

      @@Nid_All that's so american. they always want to be first.

  • @atomobianco
    @atomobianco 3 місяці тому +1

    I hope Whatsup will enable to send/get transcripts from audio messages

  • @IdPreferNot1
    @IdPreferNot1 3 місяці тому +1

    But can it run Chrysis?

  • @JankJank-om1op
    @JankJank-om1op 3 місяці тому +1

    fingers crossed for apple's ARM + unified memory to alleviate the ridiculous VRAM overhead

  • @schongut9030
    @schongut9030 3 місяці тому +1

    Probably the cheapest would be to just buy the tinybox when it comes out.

  • @GerryPrompt
    @GerryPrompt 3 місяці тому

    Whennnnnnnnn?

  • @lancemarchetti8673
    @lancemarchetti8673 3 місяці тому +1

    DeepSeek V2 is better

    • @aifluxchannel
      @aifluxchannel  3 місяці тому

      With time the relative performance per token / size of DeepSeek V2 gets better and better.

  • @GodFearingPookie
    @GodFearingPookie 3 місяці тому +4

    Not on Whatsapp. Please stay out of my message Zucker.

    • @aifluxchannel
      @aifluxchannel  3 місяці тому +1

      Exactly why I don't have this OR messenger on any of my personal devices! But I think we know they're using all of that chat metadata to train future llamas ;)