Meta Llama 3.1-405B Explained: The FUTURE of AI is OPEN-SOURCE!

Поділитися
Вставка
  • Опубліковано 12 вер 2024
  • Meta has finally delivered on their promise to revolutionize the open source large language model world with the release of the most powerful open source model in history! Llama 3.1 405B is the largest model ever released by meta and the performance numbers are equally impressive!
    Tell us what you think in the comments below!
    Meta Release: llama.meta.com...
    HuggingFace Chat Llama3.1 405B: huggingface.co...

КОМЕНТАРІ • 48

  • @TheRealUsername
    @TheRealUsername Місяць тому +6

    Something we haven't tested yet is fine-tuning such a SOTA model on specialized tasks, probably the best source of synthetic data for knowledge distillation, 405B is a robust model, better generalization, more stable and more sample-efficient, can't wait to see what the community will do with it, we've just unlocked a myriad of new use cases.

    • @aifluxchannel
      @aifluxchannel  Місяць тому +2

      I'm waiting to see if this completely surpasses nVidia's Nemo model which they created solely for creating new datasets. 8B could also lead to really interesting agentic abilities! What are you planning to use this model for?

    • @Wobbothe3rd
      @Wobbothe3rd Місяць тому

      Nvidia's Nemo allows synthetic data, came out weeks ago. Things are moving fast...

  • @toadlguy
    @toadlguy Місяць тому +3

    Ollama has the 8B model in a 4 bit quantized version (ollama run llama3.1) which I have been playing with on an 8GB M1 Mini. It has a slightly higher memory footprint than the llama3 model at least in this configuration, probably due to the much larger context window. This causes it to swap and slow down the tokens per second. It also does not produce noticeably better results so far with my initial testing. It may require the 3 bit version to run in this envelope (which would degrade it further), however I wouldn't be surprised if Ollama can't do something as well since the model size is about the same (3 vs. 3.1). Having the greater context in such a small model opens all sorts of new possibilities. Fine-tuned versions of this will be very exciting for local use.

    • @aifluxchannel
      @aifluxchannel  Місяць тому

      I've been having a similar experience with my M2 Max macbook pro. I think smaller quants are going to need to be tuned somewhat heavily to see consistent gains across any number of potential use-cases. Context window is everything.

    • @mirek190
      @mirek190 Місяць тому

      under ollama or llamacpp new llama 3.1 is noy fully implemented yet and because of it is worse than llama 3 right now .

  • @raketemensch6116
    @raketemensch6116 Місяць тому +4

    Good stuff. Not sure if you got trolled or if you're trolling us with the pronunciation of Claude -- in either case, it's pronounced "CLAWED." As in Claude Rains, Claude Shannon, Claude Monet, Claude Chabrol, Claude Debussy...

    • @aifluxchannel
      @aifluxchannel  Місяць тому +2

      Hahaha, given the comments it's a bit of a meme. I understand it's CLAWDE haha.

    • @ManjaroBlack
      @ManjaroBlack Місяць тому

      It’s cloud. As in clouds that rain.

  • @antaishizuku
    @antaishizuku Місяць тому +1

    I'd argue RAG is still really useful. You can use it to minimize hallucinations and provide additional context beyond what the model was trained on. Thus even if RAG isn't useful for every situation and has its drawbacks it still will be a huge asset.

  • @pn4960
    @pn4960 Місяць тому +2

    It looks powerful ! I’m very excited

    • @aifluxchannel
      @aifluxchannel  Місяць тому

      Llama 3.1 70B is for now what I'm most impressed with!

  • @Luizfernando-dm2rf
    @Luizfernando-dm2rf Місяць тому +1

    Ah yes, the mysterious Clude model 😶‍🌫
    But in all seriousness, Llama 3.1 is BIG news for open source, thanks Meta

    • @aifluxchannel
      @aifluxchannel  Місяць тому

      Meta delivered and its a great day for progress in open source!

  • @Arcticwhir
    @Arcticwhir Місяць тому +1

    i view benchmark results especially human eval and coding, to be graded on a logarithmic scale, in the sense that the higher these benchmakr scores are 89 vs 92, that 3% difference is much more drastic than lets say 50 vs 53.

    • @aifluxchannel
      @aifluxchannel  Місяць тому

      This is a great point! I also think these benchmarks fail to convey a lot of finer context with *how* these models generate code. Especially when you're comparing full files and just "completion".

  • @seanreynoldscs
    @seanreynoldscs Місяць тому +2

    How much expected gpu memory is needed exactly to run this locally on something like ollama when that is available?

    • @raketemensch6116
      @raketemensch6116 Місяць тому +1

      Anywhere from 200GB to 1TB+ depending on quantization level

    • @aifluxchannel
      @aifluxchannel  Місяць тому +1

      At least 256GB ;)

    • @fontenbleau
      @fontenbleau Місяць тому

      ollama is bad that you must re-encode kinda already working model into ollama format, i don't know what that is and why, but even such preparation for ollama takes huge amounts of storage in large models(usually ollama people doing that), so your only way is oobabooga and GGUF by system RAM (server motherboards have plenty RAM slots)

    • @fontenbleau
      @fontenbleau Місяць тому

      it all depends on ollama people hosting capabilities

  • @novantha1
    @novantha1 Місяць тому

    Instead of fine tuning 405B directly, I would be really interested to see a re-implementation of Let's Verify step-by-step and fine tune a smaller model to pick the best approach for things like coding.
    It'd get pretty expensive to run, but I wonder how well a best of ten or best of one hundred would do for coding problems.
    Plus, there's obviously the agentic workflow applications where this really could be your coding assistant.

    • @aifluxchannel
      @aifluxchannel  Місяць тому

      I think finetunes from 70B will be showing up first just because it's cheaper and will take less time. Agentic finetunes will rarely benefit from larger parameter counts, unless the architecture is wildly different compared to past llamas.

  • @fontenbleau
    @fontenbleau Місяць тому +1

    Exo should make that for Linux, it's dumb using only Mac's, with same problem of RAM. I see the only problem on market is that 128Gb LRDIMM is quite expensive and rare, it can be hard to find identical later if i'l invest money into it. At the same time 64Gb LRDIMM is plenty and affordable, i can buy 768Gb total like tomorrow. Of course DDR4, used server ram for my server board.
    Maybe server farms migrating to 128’s and dumping 64's, so many of them.

    • @aifluxchannel
      @aifluxchannel  Місяць тому +1

      They have a linux client, it's just not quite as mature. Initially EXO focused on Macs since the Mac Studio is an incredibly powerful platform and with proper networking is actually more cost effective than using the nVidia A100 80GB.

    • @fontenbleau
      @fontenbleau Місяць тому

      @@aifluxchannel thanks aiflux! 🤗

  • @asd1asd219
    @asd1asd219 Місяць тому +1

    very impressive video can i use this model for my medical studies instade of gpt and claude like it has a better analysis?

    • @aifluxchannel
      @aifluxchannel  Місяць тому +1

      Instruct models are generally better for "analysis" or flows that require multi-turn reasoning! What kind of AI do you currently use?

    • @asd1asd219
      @asd1asd219 Місяць тому

      i was a gpt plus user but now i use different langauge models u know the companies every day make a surprise like good morning we interduce our new langauge model

  • @AaronALAI
    @AaronALAI Місяць тому +1

    I'm gonna try quantizing into 4bit ggufs. I think ill only need to offload about 70gb to cpu ram. I'm curious to see what the inference speeds are.

    • @aifluxchannel
      @aifluxchannel  Місяць тому

      Hoping for anything better than 2-3 tok/s!

  • @gileneusz
    @gileneusz Місяць тому +1

    3:35 sure I have a hardware, like everyone 😭

  • @taylorfrancis9403
    @taylorfrancis9403 Місяць тому +1

    How can you run the 8b model anywhere?

    • @aifluxchannel
      @aifluxchannel  Місяць тому

      Technically the smallest quants available can run with as little as 8GB of Vram.

  • @Michael330167
    @Michael330167 Місяць тому

    llama 405b is showing up on Groq model list, but on completion it produces: "message": "The model `llama-3.1-405b-reasoning` does not exist or you do not have access to it.",
    Models "llama-3.1-8b-instant" and "llama-3.1-70b-versatile" however, do respond just fine. So, "llama-3.1-405b-reasoning" either not there or not free on Groq??

    • @aifluxchannel
      @aifluxchannel  Місяць тому

      These are Groq specific quants, I'd imaging they're just having trouble scaling their infra today. Although I encountered a similar issue with Mistral Codesral, moreover the API that was "free", which in reality was only "free" if you had a form of payment hooked up for their conventional paid apis. I haven't interacted much with Groq outside of that.

    • @toadlguy
      @toadlguy Місяць тому

      @@Michael330167 Everybody is hitting these models all at once. HuggingChat is busy more often than not, which makes chaining with tool use difficult. But everyone (well everyone with serious hardware) can spin up a 3.1 405b server so it will saturate the market and then we should have no problems anymore (as long as Crowdstrike doesn’t get involved 😜)

  • @MrDowntemp0
    @MrDowntemp0 Місяць тому +3

    clood

    • @toadlguy
      @toadlguy Місяць тому +1

      🤣

    • @aifluxchannel
      @aifluxchannel  Місяць тому

      CLAWDE lmao

    • @MrDowntemp0
      @MrDowntemp0 Місяць тому

      @@aifluxchannel I have faith you'll nail it sooner or later.

  • @pigeon_official
    @pigeon_official Місяць тому

    bro ive heard you say literally every possible pronunciation of Claude except the correct one... Clood...

    • @aifluxchannel
      @aifluxchannel  Місяць тому +1

      Maybe I'm pronouncing to solicit comments correcting me ;)

    • @TheOpenSourceMerc
      @TheOpenSourceMerc Місяць тому

      @@aifluxchannelaktuallly it’s cl-oooorr-da

    • @ManjaroBlack
      @ManjaroBlack Місяць тому

      I like that you’re pronouncing the silent L. To me it’s so weird when people leave out the L.

  • @thisisashan
    @thisisashan Місяць тому

    Opensource AI is an absolute idiotic idea sorry.
    While you, me, and your brother down the street might be just fine messing around with this right now.
    Giving access to China, Russia, Iran/Iraq, etc... Access to the finished products of massive AI compute LLM's is atrocious.
    Yes, this will be a lot of fun and very interesting to mess around with. No the devil doesn't have the same good intentions as you do.
    As a programmer of 27 years I doubt having access to this excites people much more than it does me.
    But as these AI models approach perfection it becomes more and more necessary to cut off nations who could use these models for harmful purposes.
    Cutting these nations off from compute, like we did with China and Nvidia products, is faultered if we simply give them the finished product like Zuckerberg just did.
    TBH this basically confirms to me that his wife has always been a Chinese spy. If you doubt that or think that is nutty you should look into her background.

    • @aifluxchannel
      @aifluxchannel  Місяць тому +1

      Should we also classify linear algebra?