Mistral Small 3 - The NEW Mini Model Killer

Поділитися
Вставка
  • Опубліковано 30 січ 2025

КОМЕНТАРІ • 45

  • @kenchang3456
    @kenchang3456 33 хвилини тому

    Indeed, good to see Mistral is active again. Thanks Sam.

  • @BrandonFoltz
    @BrandonFoltz 9 годин тому +21

    I use the "expensive models" to create prompt templates, tools, other functions, example outputs, etc. then use those in calls to free open source models. Makes them even better. ☺

  • @MrLeo000
    @MrLeo000 7 годин тому +9

    I will NEVER get tired of the initial "Ok. So..."

  • @caeras1864
    @caeras1864 6 годин тому +4

    Thanks Sam. I really enjoy your content. You basically distill what you need to know into a nice 10 minute segment, have nice small code examples, explain concepts well, and avoid all of the hyperbole around AI.

  • @Nik.leonard
    @Nik.leonard 5 годин тому +10

    Now I'm waiting for a Deepseek-R1-Distill-Mistral-Small. That will be FIRE.

    • @geneanthony3421
      @geneanthony3421 5 годин тому

      They already have different sizes for it down to 1B

    • @Gilgamesh12382
      @Gilgamesh12382 4 години тому

      This much better than the others distilled models till now

    • @geneanthony3421
      @geneanthony3421 3 години тому

      @@Nik.leonard sad I got a 3080 ti and this model is still too much for it. Deepseek at 14b works fine though.

  • @Protorobo1
    @Protorobo1 Годину тому

    Thank you. Does it handle nested structured output?

  •  3 години тому

    Great insight in the conclusion

  • @abhijitkadalli6435
    @abhijitkadalli6435 8 годин тому

    can you point me to resources on 2:23 what is ROPE?

    • @samwitteveenai
      @samwitteveenai  8 годин тому +2

      RoPE is for Rotary Positional Embeddings it allows you to fine tune the model to increase the length of the context window

  • @johang1293
    @johang1293 7 годин тому +3

    Wish they didn't label this as small and leave the small definition for models that can run on edge devices, 3b and smaller.

    • @ZedDevStuff
      @ZedDevStuff 5 годин тому

      Yeah for a second I was excited :')

    • @Gstreng
      @Gstreng 5 годин тому

      Isn't that more the "tiny" label?

    • @ZedDevStuff
      @ZedDevStuff 5 годин тому

      @@Gstreng tiny as far as my experience says is basically 1.5B and lower

  • @imqqmi
    @imqqmi 2 години тому

    Now that GEITje is gone, is this a good Dutch model?
    Would be nice as daily driver model.

  • @neoncyberia
    @neoncyberia 10 годин тому +5

    Good. I need a workhorse I can install on my machine.

  • @erniea5843
    @erniea5843 5 годин тому

    Yes!

  • @IvarDaigon
    @IvarDaigon Годину тому

    I really wish they would design their models to fit into standard GPU memory sizes instead of using weird numbers.. I'll have to wait till I try it but I bet it's just slightly too large to run on a 12gb video card at q4 or a 24gb card at q8. 32GB cards exist but they are ridiculously expensive and hard to get atm.

  • @7T7Soulz
    @7T7Soulz 3 години тому

    mistral moe is dope

  • @bhushanj
    @bhushanj 8 годин тому +1

    Thanks for this video!! Just a request if it works for you - could you please create a video on safe, secure use of open source LLMs? Points like data residency, memory Cleaning, etc. It would be really very useful.

    • @samwitteveenai
      @samwitteveenai  8 годин тому +1

      This is an interesting idea. Can you elaborate a bit more? Do you mean like data hygiene?

    • @bhushanj
      @bhushanj 7 годин тому

      @@samwitteveenai Thanks for the reply and finding it an interesting idea. Before we embark on LLM usage, it would be good to know to understand various aspects of data security, privacy like encryption of data transient and at rest, whether LLM saves the data which it has to process to give output, how we can ensure LLM cache memory cleanup, whether data is saved locally or across boundary, region, etc. Looking for general guidance on safe usage of LLMs from data security perspective. Your expert views would really be helpful, Thank You!!

    • @DJ-dh3oe
      @DJ-dh3oe 5 годин тому

      @@bhushanj If you're downloading an open source model then all of those factors are exactly the same as building any other application. The LLM itself doesn't save or cache anything so it's just the same considerations for securing a server, database etc.

  • @yannickpezeu3419
    @yannickpezeu3419 9 годин тому

    This could'nt come at. Better time for me ! 🎉🎉🎉

  • @ibrahimhalouane8130
    @ibrahimhalouane8130 9 годин тому +1

    Great video, I guess the only annoying thing about Mistral is the fact they've their own architecture, can't really use it as a drop placement API kinda thing, I tend to wish an English-only for small (7b or less) LLMs specially for function calling stuff, whereas large one should definitely be polyglot, feels like users are getting greedy after deepseek-r1, aren't they?

    • @ayrengreber5738
      @ayrengreber5738 9 годин тому +1

      Someone usually Llamafies it pretty quickly

    • @samwitteveenai
      @samwitteveenai  9 годин тому +3

      lol yes people want R1 680b quality in 7B models

    • @DonG-1949
      @DonG-1949 6 годин тому

      what did you mean about the architecture problem? just curious

    • @ibrahimhalouane8130
      @ibrahimhalouane8130 5 годин тому

      As far as I know, in order to use recent versions of Mistral's models, you'll need to install their own Mistral library, you can't just use the huggingface's API unless you do some hacks to get around that somehow!

  • @wag6181
    @wag6181 3 години тому

    The small is still big

  • @diwanliwe
    @diwanliwe 10 годин тому +5

    you're so fast 🏎💨

    • @samwitteveenai
      @samwitteveenai  9 годин тому +2

      it helps when I got it a bit early 😀

  • @RaviAnnaswamy
    @RaviAnnaswamy 9 годин тому +2

    Wow you did it so fast and very thorough and informative
    Subscribing

  • @DavidSilva-c9p
    @DavidSilva-c9p 7 годин тому

    At this point, we expect nothing less than R1's capabilities.

    • @thornelderfin
      @thornelderfin 7 годин тому +2

      I don't think I can fit DeepSeek's 650 GB into my VRAM. On the other hand, I am downloading 19 GB Mistral (Q6) right now to test it. DeepSeek-R1 is only available either if you send everything you have directly to the CCP, or from US and EU providers for $8 per million tokens.

    • @DonG-1949
      @DonG-1949 6 годин тому

      @@thornelderfin You can't download it on a virtual GPU?

  • @saadowain3511
    @saadowain3511 8 годин тому +1

    No arabic .. this is bad

    • @samwitteveenai
      @samwitteveenai  8 годин тому

      yes unfortunately they have a lot of languages missing.

  • @holdenmcgroin8917
    @holdenmcgroin8917 6 годин тому

    It's such a joke to talk about this trash