🦙✨ BigLlama-3.1-1T-Instruct: 1 TRILLION Parameter Llama Merge (Melts GPUs)

Поділитися
Вставка
  • Опубліковано 28 жов 2024

КОМЕНТАРІ • 41

  • @gileneusz
    @gileneusz 2 місяці тому +6

    2:20 "only 681B parameters", what a time to live....

  • @VastCNC
    @VastCNC 2 місяці тому +4

    Superficial comment, but when you record your audio, put a couch in the room or throw up some sound blankets. I can hear the room much more than I should. I only comment this because I love what you put out and there’s an easy win on audio quality. Your mic is decent, the room needs some work.

    • @aifluxchannel
      @aifluxchannel  2 місяці тому

      In a temporary location traveling, will soon be back in my studio with proper audio dampening but thanks for the tip!

  • @novantha1
    @novantha1 2 місяці тому +1

    What’s most interesting about frankenmerging, isn’t necessarily scaling beyond scaling laws (though stuff like this is great fun, too!), but rather, I notice that in creative writing a lot of people report great results with smaller models for whatever reason. It might be that the extra parameters are like a longer “barrel” on a shotgun which limits the spread of possible output, or it might be that smaller models are able to receive more training relative to their size (there could be important changes like grokking that only occur in extreme training scenarios) but regardless, I’ve seen multiple subjective reports like that.
    With that in mind, I think that frankenmerges might have more success with, dedicated fine tunes, continued pre-training and the like in smaller models (in the 7-13B range).

  • @GerryPrompt
    @GerryPrompt 2 місяці тому +7

    No way I could even run this with 30 3090s 😂

    • @aifluxchannel
      @aifluxchannel  2 місяці тому

      Not quite!

    • @666-d5y
      @666-d5y 2 місяці тому +1

      U can run 4but of this on 20 3090s thi

  • @gileneusz
    @gileneusz 2 місяці тому +3

    I would not be surprised if Zuck publishes a 1T+ open-source parameter model, since he will have 100k H100s available for training very soon... 😁

    • @florentromanet5439
      @florentromanet5439 2 місяці тому

      The H100s are delayed by Nvidia by the way. Not only for Meta but it seems they're having production issues...

  • @gileneusz
    @gileneusz 2 місяці тому +2

    7:50 ollama is just upgrading their 405B quants rn...

  • @fontenbleau
    @fontenbleau 2 місяці тому +4

    I was able to run 405B GGUF on CPU, it requires just 465Gb of Ram in best q8 quality (which I have in 12 slots server board). The most problem and the longest was trying to load all that 465Gb from ordinary HDD with 700Mb/s transfer speed, that is taking 30 minutes, after loading into Ram it works, like 1 token/s, also next load are mich faster. Unfortunately the only 512Gb SSD is busy at the moment, so hard drive is the only way for now, but slowest also.

    • @aifluxchannel
      @aifluxchannel  2 місяці тому +2

      Time to upgrade to NVME!

    • @gileneusz
      @gileneusz 2 місяці тому

      @@aifluxchannel this is why I consider pcie NVME that would bump up NVME reading speeds to 30,000+MBps, but still CPU will be the bottleneck... 1t/s? really? better use grok 🥲

    • @fontenbleau
      @fontenbleau 2 місяці тому

      ​​​@@gileneuszterabyte SSDs are inaccessible with prices on them, esp nvme, new hard drives are really great to have a dozen models but Sata is a bottleneck.

    • @gileneusz
      @gileneusz 2 місяці тому +2

      @@fontenbleau if you RAID them, you can get better speed even with sata

    • @fontenbleau
      @fontenbleau 2 місяці тому +2

      ​@@gileneuszI see the only solution in distributed calculations, there's some amateur projects on Git which supports gguf processing on scale, where I could use 2 home servers with Cpus + GPUs combined

  • @gileneusz
    @gileneusz 2 місяці тому +2

    1T model is no joke

  • @fontenbleau
    @fontenbleau 2 місяці тому +1

    I think we won't see in any near future many VRAM in GPUs because servers Ram which i've bought overheating to 90C in idle and stamped warning "Danger. Surface hot" is no joke, i've managed to reduce it to 60C.

  • @НиколайКол-е2и
    @НиколайКол-е2и 2 місяці тому +1

    What a good time to make and sell memory

  • @manonamission2000
    @manonamission2000 2 місяці тому +2

    not the first 1T+ parameter model

    • @aifluxchannel
      @aifluxchannel  2 місяці тому +1

      It's a merge, so likely done before. What model are you referring to?

    • @manonamission2000
      @manonamission2000 2 місяці тому

      @@aifluxchannel ChatGPT-4 is estimated to have roughly 1.8 trillion parameters. Source: G Hotz, and confirmed by others.
      High temps may have led to exploring the mini version.

  • @leomaxwell972
    @leomaxwell972 2 місяці тому +1

    Can I run like a trillion models while hopping like a Llama? You bet your Strawberry I can, like it's Automatic!!!!(11111)

  • @Zale370
    @Zale370 2 місяці тому +1

    Everyone keeps talking about how better every new llm release is but when you use them they are all the same, especially for coding, I only noticed a difference between gpt 3.5 and gpt 4.

    • @aifluxchannel
      @aifluxchannel  2 місяці тому +1

      Everyone has different use-cases, for me I gauge how much these models can accomplish in a single shot and how much (valid) code can be produced in just a few prompts. What do you generally use these models for?

    • @Zale370
      @Zale370 2 місяці тому +1

      @@aifluxchannel I use perplexity and play around with the latest models for coding tasks, for summarizing UA-cam videos or text I use local models, but as I said I stopped noticing differences after gpt4

    • @robd7724
      @robd7724 2 місяці тому

      Not even with Claude 3.5?

    • @ryzikx
      @ryzikx 2 місяці тому +2

      @@robd7724 true unnerfed claude 3.5 definitely better

    • @blisphul8084
      @blisphul8084 2 місяці тому

      ​​@@Zale370yeah, even Gemma 2 2b is good for summarizing text. On llama cpp, I get 100 t/ on RTX 3060Ti.

  • @大支爺
    @大支爺 2 місяці тому +1

    256Gb DDR5 ECC*4+1

  • @Hypersniper05
    @Hypersniper05 2 місяці тому +1

    Heugdhdhd

  • @Sanguen666
    @Sanguen666 2 місяці тому +6

    Can you review SicariusSicariiStuff/2B_or_not_2B ? Many ppl hyped the fk out of this model

    • @Sicarius-prod
      @Sicarius-prod 2 місяці тому +2

      🤗

    • @aifluxchannel
      @aifluxchannel  2 місяці тому

      What is this model intended for?

    • @Sicarius-prod
      @Sicarius-prod 2 місяці тому +3

      @@aifluxchannel It's a new model designed for use on edge devices, such as smartphones, with a focus on uncensored creative writing and general assistant tasks. This model is highly compliant with user requests, as the base Gemma this model was built upon, was quite disappointing due to its tendency to moralize and refuse even trivial queries.
      Please feel free to test it at your convenience, and do not hesitate to reach out if you have any further questions or require additional information.