New Mac Mini M4 running SD1.5, FLUX, and Ollama (Qwen-coder 2.5 14B model)

Поділитися
Вставка
  • Опубліковано 10 січ 2025

КОМЕНТАРІ • 60

  • @bestof467
    @bestof467 Місяць тому +9

    Pls test M4 Max and Flux Dev + video generations

  • @RDUBTutorial
    @RDUBTutorial 10 годин тому

    Hope you are working on a WaveSpeed or wave-speed tutorial for flux and ltx for us Mac users that could really use the speed boost.

  • @noahleaman
    @noahleaman Місяць тому +13

    I’m getting about 5 minutes using the full (not quantized) flux dev model on a M3 Max 64GB. 16GB isn’t enough to hold the model in memory along side all the other system stuff that’s is already using that 16GB. You should note the swap utilization when running a generation to get an idea of how much data is being moved between memory and disk.

    • @greendsnow
      @greendsnow Місяць тому +1

      Sounds like you've been ripped off

    • @paultparker
      @paultparker Місяць тому

      @@greendsnowwhat would you recommend instead for this system price?

    • @noahleaman
      @noahleaman Місяць тому +1

      I wouldn’t recommend a Mac if you are only looking for fast image generation at high res with full sized flux dev models. (Not why I have this particular system btw.) but having 64GB or more of memory that the GPU can access directly is good for large models of any kind.

    • @HermanWillems
      @HermanWillems Місяць тому +1

      @@paultparker Second hand NVIDIA GPU's with 24GB. My old 3090 is wayyy wayyyy faster.

    • @sysadmin-info
      @sysadmin-info 14 днів тому

      ​@@noahleamanRTX 5090 will be presented soon. DDR 7 RAM. :)

  • @inicsf
    @inicsf Місяць тому +3

    Great video! My Mini M4 Pro is due in a few days. Can't wait. I didn't purchase it with local LLMs in mind, but being able to do something with them is a plus. The extra memory bandwidth should help. For reference, my close to retirement rtx 3090 (power limited to 65%) has an eval rate of 52 tokens/s with the 14b model, and 28 tokens/s with the 32b one. Nonetheless, the M4/M4 Pro power efficiency is exceptional.

  • @FruityFoxArt
    @FruityFoxArt 24 дні тому

    Thanks bro, I was gonna buy it to replace my m2 pro. But now I feel that I should just stick to what I have and use the online solution instead of offline.

  • @carstenli
    @carstenli Місяць тому +6

    Ollama does not yet support MLX yet, unlike LM Studio. MLX has been reported to provide up to 40% faster inference speeds compared to Llama.cpp, which is currently the backend used by Ollama.

  • @garywang04
    @garywang04 8 днів тому

    Helpful for my studying MAC mini M4 pro...:) Thanks!!!

  • @emmanuelpineda3630
    @emmanuelpineda3630 Місяць тому +3

    This video is really useful thank you!

  • @brulsmurf
    @brulsmurf Місяць тому +5

    Qwen-coder 2.5 14B model, same question on a gtx 1080 ti
    total duration: 14.016461817s
    load duration: 35.829686ms
    prompt eval count: 35 token(s)
    prompt eval duration: 137.915ms
    prompt eval rate: 253.78 tokens/s
    eval count: 314 token(s)
    eval duration: 13.697011s
    eval rate: 22.92 tokens/s

    • @LumiLumi1300
      @LumiLumi1300 22 дні тому

      So I’d be better off buying an old 1080ti instead of a mac mini? Damn. Saved me $1500

    • @brulsmurf
      @brulsmurf 22 дні тому

      @@LumiLumi1300 purely for LLM's, yes.

  • @zl5667
    @zl5667 Місяць тому +1

    really useful information🎉

  • @TheDanEdwards
    @TheDanEdwards Місяць тому

    Thanks for doing this and showing the usage stats. Seems like buying the extra RAM is worth it for any kind of generative AI or LLMs.

  • @HermanWillems
    @HermanWillems Місяць тому +2

    Honestly i just tested the Ollama the same way you did got almost the same type of output from the same string. But with my OLD Nvidia 3090 it was more than 100 tokens. So even though this M4 is fast. It's by far not as fast as a decent Nvidia GPU

  • @FGCVidz
    @FGCVidz Місяць тому +2

    Id be worried about my ssd long term using that much swap consistently. Good video though. I will check out your tutorial on running flux and installing comfyui.

  • @karthik448
    @karthik448 Місяць тому +1

    For just usage as code assistance using continue dev, I heard these Mac's are not that great here as it needs new input processing repeatedly as the user might use multiple iterations to tweak the code output. For a normal development task do you find the performance ok with the qwen coder models on this specific mac?

  • @MaxF88
    @MaxF88 Місяць тому +2

    Good test!
    Just wondering to get this base Mac mini (never used Mac b4 😜) or stick with Windows Amd 5600 + 7800xt desktop with 16gb ram?
    What would you recommend for local llm purpose? I suppose upgrade Mac mini to 32gb RAM or more will enable access to larger model with same speed token/s?

    • @tech-practice9805
      @tech-practice9805  Місяць тому +4

      I think discreet GPU still is faster. But the VRAM is limited for them. So Mac's unified RAM is an advantage to enable fitting of lager model size.

  • @luismigueloteromolinari5406
    @luismigueloteromolinari5406 Місяць тому +3

    This is a great test, would you test the flux pixelwave model q4-Dev is 6 gb and works in my Mac with amd 16gb gpu

    • @tech-practice9805
      @tech-practice9805  Місяць тому +1

      thanks! Should have similar speed as the Schnell q4.

  • @SK-S2N
    @SK-S2N 15 днів тому

    Thanks Ollama

  • @Phrozends
    @Phrozends 7 днів тому

    What is the name of the RAM monitor app?

    • @tech-practice9805
      @tech-practice9805  5 днів тому

      it's called 'stats', see my previous uploaded video for details: ua-cam.com/video/USpvp5Uk1e4/v-deo.html

  • @odebroqueville
    @odebroqueville Місяць тому

    Thank you for sharing this video. 👍 I now understand that the base model probably won't allow me to run open LLM models beyond 14 billion parameters locally. The only remaining question I have is if 24GB of unified memory can handle 32 billion parameters.

    • @paultparker
      @paultparker Місяць тому +2

      Based on my research I would not expect this unless quantized down to q5. Are you OK with that quality trade off? Also, instead of just bumping the RAM I would bump up to the pro chip, which includes the RAM, more GPU, and better memory bandwidth I believe.

  • @Pablo-Ramirez
    @Pablo-Ramirez Місяць тому

    Muchas gracias por tu video y trabajo. Estaba buscando algo parecido para decidir cual comprar para usar LLM de forma local. Gracias.

  • @saravanannatarajan6515
    @saravanannatarajan6515 Місяць тому

    Nice one! Can you please let us know what app you use to measure cpu, gpu , ram percentage as shown in the video

    • @tech-practice9805
      @tech-practice9805  Місяць тому +1

      Thank you! The tool is called 'stats'. I uploaded a video for it at ua-cam.com/video/USpvp5Uk1e4/v-deo.html

    • @saravanannatarajan6515
      @saravanannatarajan6515 Місяць тому

      @@tech-practice9805 Thanks a lot, actually I installed istats menu but it seems not a open source . So will try this one stats, thanks for that

  • @cbuchner1
    @cbuchner1 Місяць тому +1

    I just got the M4 Pro Mac Mini with 64 Gigs of RAM for the best AI inference

    • @tech-practice9805
      @tech-practice9805  Місяць тому

      is there a 64GB version?

    • @cbuchner1
      @cbuchner1 Місяць тому +3

      versions with the M4 Pro chip are available with 24, 48, 64GB

    • @firworks
      @firworks Місяць тому

      @@cbuchner1 How is it working for you? I've been considering the same.

    • @devmely
      @devmely Місяць тому

      Why dont you buy a pc with rtx ? Btw i like macs but for ai thing it is expensive

    • @cbuchner1
      @cbuchner1 Місяць тому +4

      @devmely I am coming from this side. Fed up with power consumption, fan noise and inflated GPU pricing & lack of VRAM in the consumer space.

  • @ParameshChockalingam
    @ParameshChockalingam Місяць тому

    How did SDXL base model perform ?

    • @tech-practice9805
      @tech-practice9805  Місяць тому +1

      I didn't test SDXL. But should be faster than the Flux

  • @PawFromTheBroons
    @PawFromTheBroons Місяць тому

    That's where/when we Mac users have CUDA envy.

  • @dindayalsingh2613
    @dindayalsingh2613 Місяць тому

    For the price it comes i think its worth as compared to rtx

  • @PranavSarda-d5y
    @PranavSarda-d5y Місяць тому

    did you try flux dev on this? can it generate if so how much time it would take?

    • @tech-practice9805
      @tech-practice9805  Місяць тому +1

      the GGUF FLUX dev works. Times is 5 times that for the FLUX schnell.

  • @Xiaoyi_Wang
    @Xiaoyi_Wang Місяць тому

    我滴天

  • @kornerson23
    @kornerson23 Місяць тому +3

    schnell took more than 2 minutes for 4 steps!!! those are horrible times. Unfeasible for using .dev. Apple needs to improve a LOT their tech or they are going to be behind all the time. (and I am an apple lover) but this times... are reaaally bad

    • @tech-practice9805
      @tech-practice9805  Місяць тому +3

      There should be room for optimization. LLM works quite well with ARM CPU.

    • @michaelvarney.
      @michaelvarney. Місяць тому +4

      And how much is the gpu alone that you are comparing to the mini?

    • @greendsnow
      @greendsnow Місяць тому

      No they don't, they should start selling upgrades for $500, each. 24gb ram? +$500!

    • @paultparker
      @paultparker Місяць тому

      @kornerson23 Can you substantiate this with a comparable price machine with better performance? What about comparable performance?

    • @yangho8
      @yangho8 Місяць тому +3

      this tiny machine is half of price of RTX 4090

  • @garen591
    @garen591 3 дні тому

    What a lousy M4 chip.. Flux took 3-4mins to generate?!? Can throw that into the garbage bin for any AI inference works