Llama 3.2 goes Multimodal and to the Edge

Поділитися
Вставка
  • Опубліковано 27 вер 2024

КОМЕНТАРІ • 20

  • @toadlguy
    @toadlguy День тому +5

    These small models are not only good for low memory situations but also where you can have multiple models run at once. Work is being done where you can run 405B by loading and unloading layers (epochs) in small memory configurations to run more advanced models much slower and run these small models for routing and interactivity at the same time. All this could be done locally in situations where you don’t want to send the data it is working with (like personal information) off the device.

    • @samwitteveenai
      @samwitteveenai  День тому

      Very good point about multiple models, totally agree.

  • @SirajFlorida
    @SirajFlorida День тому +3

    11 and 90B make since because it's 3b and 20B vision parameters respectively? That's what I would guess right off the bat.

  • @Nick_With_A_Stick
    @Nick_With_A_Stick День тому +1

    It kind of makes me sad that meta trained llama two on audio and pictures and made it where I can output, audio and pictures, and then Nerfed the model removed the decoders for “safety” reasons. And released it even though L3 was already out, and now they are using that llama three version of the model on their app where you can talk to it, as if it was GPT4 Omni.

  • @IvarDaigon
    @IvarDaigon День тому

    Another obvious use case for the mini models is moderation. APIs like OpenAI require you make a moderation call before making the inference call which means two round trips to the server before you get any content you can show to the user. If you can do moderaion on device, then you only need one round trip, making your realtime chats appear faster to the user.
    Moderation, routing, summarization = mini models for the win.

  • @comfixit
    @comfixit День тому

    Yes please a video on fine tuning these models would be awesome. Also videos showing the tiny models running on edge devices and or in browser would be super cool as well.

  • @autoflujo
    @autoflujo День тому

    Nice video! It would be awesome if you can make a video of how to fine tune these small models.

  • @chenqu773
    @chenqu773 День тому

    Thank you for this quick update Sam! BTW, "QWen" should probably be pronounced as "qian wen" in original Chinese with the hidden meaning of "capable of answering to thousands of questions". 😀

    • @samwitteveenai
      @samwitteveenai  День тому

      lol I tried to pronounce it like their devrel guy does. Is there an audio some where I can hear it ?

  • @i_accept_all_cookies
    @i_accept_all_cookies День тому

    This is great news! Can't wait to start using the lightweight models.

  • @aminzarei1557
    @aminzarei1557 День тому

    Hey Sam, Great video 👌
    Will be waiting for fine-tuning 1b json in and out

  • @ibrahimhalouane8130
    @ibrahimhalouane8130 День тому

    No intro no music right to the point amazing work Sam.I wish to know your opinion about unsloth ?

    • @samwitteveenai
      @samwitteveenai  День тому +1

      I love unsloth. Its a simple but good way for people to do LoRAs

  • @jmspat14b
    @jmspat14b День тому

    A video on how to finetune these small models would be great! By the way, being from Denmark I always test these models in Danish as well as in English. Llama 3.2 3B is by far the best small, multilingual model I have tested - far better than Gemma 2 2B!

    • @pozytywniezakrecony151
      @pozytywniezakrecony151 День тому

      they all kinda fail in Polish :D but well, in english it's quite nice

    • @samwitteveenai
      @samwitteveenai  День тому

      ohh that is super interesting to know. Is Danish one of the 8-9 prioritized languages or is it just getting better at European languages in general I wonder.

    • @pozytywniezakrecony151
      @pozytywniezakrecony151 23 години тому

      @@samwitteveenai It appears it doesn't understand some language rules or I am using too small models - tried o1-mini:latest / DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored.i1-Q4_K_M.gguf:latest /
      Qwen2.5-14B_Uncencored-Q6_K_L.gguf:latest . I.e. I asked all to write me 4 verse poems in Polish about "Bocian" . It does create some correct lines but in the middle it mixes wrong words here and there and most of the time it doesn't make sense like it would be saying a story of sort. Here o1 mini : Bocian wysoki, z wody unosi się swobodnie,
      Czerwone dzióbki biały kaptur trzyma.
      Lecąc lecieli nad pól i lasów brzegi,
      Piekne słońce oświetla mu skrzydła jak diamenty."
      "Lecąc lecieli" sounds bad :) It's like "Flying they flew ...."same word repeated. However I think this one is quite good compared to the other output 3/4 actually.

    • @jmspat14b
      @jmspat14b 23 години тому

      @@samwitteveenai I feel the need to clarify that its abilities are, of course, no where near what it is in English. But it is the first small language model I have tried, that is able to produce a Danish summary of a Danish text, which is mostly correct and coherent. It does still suffer from making up words (I think it sometimes confuses Danish with Swedish and Norwegian), but gemma 2 and other models are much worse in this regard.
      Also, its knowledge regarding Denmark is very limited - as you would expect for such a small model, I suppose. If for example I ask it to list the last 5 prime ministers of Denmark it only knows the current one and hallucinates the rest. When asking it to list the last 5 governors of any US state, I find that it typically gets 4-5 right.

    • @samwitteveenai
      @samwitteveenai  23 години тому +2

      I looked up both these languages and they aren't in their main multilingual priority languages. Speaking to a friend they pointed out that there aren't huge amounts of Facebook users there, so that might be a reason. Meta themselves are benefiting from all the data they have for training etc. I think it also prioritizes some of their training decisions